Interfacing through voice, the race to dominate the home


Interfacing through voice, the race to dominate the home

Skip ahead to: Down the rabbit hole

Our vision of the smart home today is a series of disaggregated interactions with smart devices that control the domestic flow of a household, from controlling light ambience and temperature control to front door security. However, Big Tech’s move into the smart home space is reflective of a vastly different approach. Consumer-focused Big Tech, Google Amazon and Apple are interested in the smart home as an extension of the consumer’s interaction with the digital world – where their core propositions lie. Currently, a consumer’s interactions with Big Tech’s core propositions whether it is search, commerce or software is through the mobile or the desktop. An individual in the privacy and the comfort of the home represents an opportunity to create a touchpoint that transcends both the desktop and the mobile. However, a disaggregated bundle of smart devices, while utility-driven, don’t tap into the full potential of a truly high tech smart home ‘experience’. So what is the lucrative future big tech is chasing in consumer homes? 

Across all three consumer tech companies, their approach to reach the consumer within their home has been through the combination of an AI assistant paired with a smart speaker, a voice portal to the digital world. In thinking about transcending interactions beyond mobile or desktop, voice-based interactions are an almost natural evolution. Voice is both intuitive and hands-free and eminently suited to the quiet and privacy of the home. For this reason, dominating voice-based interactions has become the front line in the battle for a lead on smart homes. While the market is currently segregated largely between Amazon’s Alexa/Echo and Google’s Assistant/Home, Apple has made a tentative first step with the Apple Pod. The real motivator is to become the gatekeeper of our digital interactions in the home insofar as they evolve beyond the mobile and the desktop.

The voice portal: a defensive and offensive strategy

For Big Tech the voice portal represents two-fold value. The first it helps to bulwark their existing core propositions from erosion. For example, Google dominates search and will want to ensure that it retains that dominance even as search evolves to include voice-based search. Similarly Google and Apple’s role as gatekeepers of software through the Android and iOS app stores and Amazon’s dominance in e-commerce benefit similarly by capturing and dominating any evolution of these propositions through voice. 

The second is the more important value to pursue. It is that the voice portal and all of its interactions represent an opportunity to create new and fertile fields for Big Tech to laterally expand its core proposition. The closest representations of this lateral expansion is perhaps the app store. The mobile app store acted as the gateway to the consumer’s experience of the mobile phone. Similarly, when wearables became mainstream, smartwatch specific app stores focused on leveraging the smart watch’s specific mode of interaction determined the consumer’s experience of wearables. In a similar vein, the AI assistant and speaker combination acts as the app store equivalent for voice specific interactions. Just as existing digital services built for the mobile and the smartwatch, they will build for voice-based interactions by integrating with voice-enabled AI assistants. 

Smart home devices versus a smart home experience 

When we’re talking about smart home devices, it can range from a thermostat to a smart fridge – many of the use cases will require specific hardware that results in some type of discrete action. A smart home experience, on the other hand, would require an underlying cohesion that held the home ecosystem together and enhanced the resident’s life – this extends beyond isolated devices that perform isolated functions. We can think of the voice portal as the underlying system that provides cohesion for the smart home, needing to deliver an experience as robust and desirable as the digital experience currently available to us through phones and computers. When users feel notifications on their phone are equal in value to the notifications on their smart speaker, that’s when the portal really opens up to allow for immense value capture. This kind of potential is then able to be channelled profitably through killer applications catered to the voice portal. But let’s take a deeper look at why the experience, as opposed to discrete function, lends itself to greater adoption. 

For the voice portal to become accessible to a broad audience, users have to be convinced of the added value in the experience of voice interactions. But in order to do this the voice portal will need applications that demonstrate value-add in a meaningful way – where the convenience and experience outweigh the cognitive expense of the transition to voice. To understand this, we can break down the tiers of voice interaction. While the more complex interactions bear a higher cognitive burden they are also the most lucrative opportunities to create an experience rather than an isolated function.  

Tier 1: Routine one-way interaction

Convenience:  smart device feeds you information based on preset prompts – e.g. Lights turn on at a certain hour, reads the weather every morning etc.

Cognitive Burden: little to no cognitive expense on the part of the consumer

Tier 2: Two-way interaction: 

Convenience: you feed information to the device spontaneously and the device reacts – e.g. set an alarm, meeting reminder etc.

Cognitive Burden: cognitive expense of trusting the device to execute on commands

Tier 3: Consumer interaction

Convenience: purchases through the voice device – e.g. order groceries, pay bills etc

Cognitive Burden: cognitive expense of trusting the device with the financial element of the consumers life

Tier 4: Push to public interaction

Convenience: you feed information to platforms on the internet that will be visible to others  – e.g. interact with social media through voice dictation 

Cognitive Burden: cognitive expense of trusting the device to signal appropriately to the public, the consumer’s self and reputation

When thinking about the first three types of interaction  (one way, two way and consumer) it becomes clear that the original experience remains largely unchanged except for the added convenience of not needing to pick up a device and do it manually. For example, using voice control to turn on the lights, set an alarm, order groceries etc. But ‘push to public interactions’ exhibit characteristics that enhance the existing experience of activities like social media interaction. For example, Facebook might allow for important notifications, status updates, voice dictation for posts etc. More importantly, it could give notifications on social signalling, such as number of likes, comments and other elements that the user would associate with their social capital. 

In this way the digital social experience becomes a natural extension of the user’s home life, providing a more plugged in, socially intuitive voice experience that reflects traditional social interactions – the current digital experience of social media is static in the sense that it only exists when the user taps into it. The voice portal would allow users with a preference for it, to integrate their digital and physical social experience in a more seamless way. 

Experience-based functions are superior in being able to deliver additional value capture because they increase touchpoints for existing services while enhancing the experience of those services. ‘User based services’ are then probably best positioned to be the killer applications for voice portals, much like they are for the browser and mobile today. While smart devices and discrete functions are definitely useful, their value add remains scattered without the cohesion of voice enhanced experiences. It then becomes clear why Big Tech is in a race to dominate the home, as they become the gatekeepers to all the value that funnels through the voice portal as users make that transition. 

Down the Rabbit Hole

1. The execution of voice as a human-centric user interface will determine the adoption of voice-based technologies  

“Interfaces to digital systems of the future will no longer be machine-driven. They will be human-centric,” explains Werner Vogels, Amazon’s chief technology officer, “We can build human natural interfaces to digital systems and with that, a whole environment will become active.”

“VUI [Voice-user interface] allows for hands-free, efficient interactions that are more ‘human’ in nature than any other form of user interface. “Speech is the fundamental means of human communication,” writes Clifford Nass, Stanford researcher and co-author of Wired for Speech, “…all cultures persuade, inform and build relationships primarily through speech.” In order to create VUI systems that work, developers need to fully understand the intricacies of human communication.”

The benefits of voice as a mode of interaction above traditional user interfaces encompasses the ability to create efficiency and convenience (hands-free), design a more human experience and imbue personality and tone to interactions creating greater brand affinity.

“Google Home is totally au fait with pop culture references, from Star Trek to Sir Mix A Lot, as is Amazon’s Alexa. A more personable tone helps users to forgive those moments when virtual assistants are unable to complete tasks or answer questions that an actual human would have no problem with.”

There are however multiple roadblocks to an immersive and coherent implementation of voice-based technologies. 

  • Discovery and retention – With only 31% of Amazon Alexa/Echo’s 7,000+ skills having more than one review, there is an issue of discovery and retention when interacting with voice assistants in their current iteration.
  • Understanding limitations – Where humans aren’t used to following the strictures of linguistic law necessary to communicate effectively with a voice assistant, setting expectations of these limitations can reduce friction. 
  • Visual feedback – Audial feedback without a visual clue leaves humans confused. Alexa’s blue light ring indicates its various states ranging from ‘do not disturb mode’ to ‘getting ready to respond’ and is a quick if somewhat limited response to this issue.
  • Natural Language Process (NLP) – While NLP research and development continues to chip away at the problem of bringing machines up to speed in the complexities of human communication, we are not quite there yet. 

“Regional accents, slang, conversational nuance, sarcasm… some humans struggle with these aspects of communication, so at this point can we really expect much more from a machine?” 

Source: How Voice User Interface is taking over the world, and why you should care – Good Rebels for Medium

2.  The reality of ambient computing in the interim will likely be stifled by the siloed ecosystems that emerge from Big Tech’s race for dominance 

“Ambient computing, simply put, is the difference between a world in which technology exists and can be used with active effort and a world so seamlessly integrated with technology, you barely notice its existence or remember a time before it. While it seems a distant reality we are definitely already on this trajectory – it’s just that the road is paved with potholes.

The level of interactivity and coordination required [for ambient computing] between technology is unprecedented, and we know the tech giants aren’t big fans of playing house…There is an inherent incentive to follow the path of siloed ecosystems, a path that stands at odds with ambient computing. This incentive is rooted in the nature of the relationship between consumers and big tech. Each consumer represents a plethora of touchpoints to the digital world, and big tech is in the business of siphoning as many of these touchpoints into their own ecosystems as possible. This fight for dominance creates neck to neck competition between the larger organisations.” 

Source: Walled gardens: Ambient computing stifled by the race for dominance – 4th Quadrant


We welcome thoughtful discourse on all our content. If you would like to further explore or discuss any of the ideas covered in this article please contact our editors directly.
Contact Details