Voice Controlled Computing – New Opportunities and a Big Challenge

5

Several times in the past this blog has discussed the work being done by large and small companies with deep pockets to usher in the age of voice-controlled computing. For the most part we have focused on the advantages consumers would realize from being able to leverage computing power with simple voice commands and the additional use cases connected devices would help satisfy as a result. As we continue to experience voice controlled devices weaving into our lives, two areas we haven’t discussed have started to come into focus – one an opportunity for insights and the other a challenge for product developers and barrier to widespread adoption.

Lessons from the Echo

We have written several times about the advantages tablet devices offer over voice controlled computers like the Amazon Echo, that offer no display, as the richness of the response is multiplied by a screen where information, imagery, and videos can be offered in addition to audio responses. The Ambility team still believes this to be a significant advantage, but we have learned a great lesson from the Echo that we didn’t fully anticipate – that a wealth of use cases can be addressed very well through audio-only responses, and that these audio-based interactions offer brand new areas of insight for service and marketing providers to learn about their audiences.

Another lesson we learned from interacting with Echo is that Siri and ‘OK Google’ are not really voice controlled computing platforms. They offer great doorways into web content end experiences, but once you get there you have to rely on tapping and swiping to get what you want.

New Use Cases, New Opportunities for Data

Over the course of a long weekend the Ambility leadership team found themselves turning to the Echo, by summoning ‘Alexa,’ more and more to satisfy simple queries and to help with tasks around the house. Our computers and mobile devices have long helped us settle debates by getting that easy answer, but how many of us turn to those devices to set a timer for the bread we’re baking or to dim the lights before dinner. With the Echo these were tasks easily completed, so by the end of the weekend we had forgotten where the light switches were and never cared to check for a timer in the kitchen.

Beyond those tasks we also turned to the Echo to play music, create a shopping list, and check traffic, but it was the mundane uses of the device to help with dinner and manage the room’s heat and lighting that stood out (Tom’s Guide also identified tuning your guitar and having Alexa act as your exercise coach as good uses of the product). These are tasks that for most people are not completed using connected devices, and therefore have been unobserved by marketers and analysts. As voice controlled devices increase in their application and penetration into modern households, the opportunity (and burden) of harnessing this new data for insights will be vast.

So overall the Ambility team liked the Echo and adopted its use for certain needs around the house quickly – to a degree that we don’t do with Siri or OK Google. Why is that?

The obvious answer is that Alexa was always available. We didn’t need to grab a phone or tablet, hold a button and then ask for what we wanted, we only had to hail ‘Alexa’ and then make a request. The not-so-obvious answer is that the Echo “interface” is built for an audio only interaction and does not default to older, tactile mechanisms of interactive experience.

Voice-Screen Interactions Require New UX Standards

Building “always on” capabilities is straight-forward enough (Siri allows it when your iPad is plugged in), but enabling audio only commands that interact with screen display is a far trickier change. Touch screen technology, historians tend to agree, was first developed in 1965 by E.A. Johnson at the Royal Radar Establishment in Malvern, UK, but it would be over forty years before mass audiences would have the chance to adopt them for anything other than highly specific interactions. Apple’s release of the iPhone in 2007 introduced intuitive standards of interaction that developers could then apply to web and application design.

Siri, OK Google, and Soundhound’s new Hound product continue to enhance the ability for our devices to recognize voice commands and provide base level responses. And now there’s even a program for making your laptop respond with J.A.R.V.I.S.-type displays like those Iron Man relies on, but for now all of these offerings assume some level of touch or mouse based interaction. For example, Siri and OK Google respond to most queries with a standard search results page (SRP) with no way to select a result by voice command. Siri’s voice controlled messaging functionality works well but correcting or editing a message can be frustrating unless you resort to tapping and typing.

jarvis-635x351

Big Challenge, Big Opportunity

The Amazon Echo so far has at least demonstrated that voice controlled interactions have some real usefulness and appeal. Even without a display screen the provision of always on audio computing is valuable. But the Echo hasn’t provided a way of navigating the rich and varied offerings the internet is so good at delivering. And a display screen would be a good start.

Tackling that interactive challenge is far more complicated than programming a voice-controlled timer, but the Echo showed us that intuitive, voice-controlled computing solutions will be a welcome addition to consumers’ connected worlds. And the payoff for the company that establishes those standards, the solutions designers who leverage them, and the analysts looking for more insights into their target audiences will be massive.

In the News – The Rise of Voice Controlled Computing

The Unblinking Eye

We all – at least those of us of a certain age or proclivity for Stanley Kubrick movies – remember Hal. The Hal 9000 was the on-board computer in the 1968 Kubrick classic 2001: A Space Odyssey that the hero interacted with by voice alone. For display, Hal only offered one red eye that glowed with unvarying consistency. A toggle switch that indicated only that “I’m on,” even when Dave very much wanted it off.

Ignoring the sinister nature of that particular example, how close are we to having a Hal-like assistant that we can turn to for the complex or mundane challenges of our home and work lives? If development and investment activity are any measure then very soon we should indeed be surrounded by devices that will respond to our voice commands more quickly and more helpfully than those voice command systems companies use to provide “customer service” when we call them.

These new voice controlled systems promise to provide interactions that not only recognize what we’re saying, but can serve up articles, images, and video from any source connected to the internet and feed them back to us immediately – and in high-def. And they recognize what we want without prompting us to “say or press 1” first. They proclaim to understand what we want based on how we’d ask for it as if we were asking a friend or colleague – but promise a more informed response.

At the moment of writing companies as varied as Soundhound (a music search and recognition apps company), Apple, Google, Microsoft, Amazon, and Conversant Labs (a company focused on providing solutions for the visually impaired) all have releases planned to deliver on-demand, voice controlled computing solutions. If you doubt that voice-controlled interactions will soon be widely available consider this from Wired magazine’s We’re on the Brink of a Revolution in Crazy-Smart Digital Assistants; Francesco Muzzi; 09/2015 – “It’s a classic story of technological convergence: Advances in processing power, speech recognition, mobile connectivity, cloud computing, and neural networks have all surged to a critical mass at roughly the same time. These tools are finally good enough, cheap enough, and accessible enough to make the conversational interface real – and ubiquitous.”

The Ambility team has more than a passing interest in this trend as we hold intellectual property in a solution for flexibly positioning tablet computers, and with their mobile connectivity and rich display capabilities tablet users seem destined to be one of the main beneficiaries of voice controlled computing. When smart, digital assistants can be provided hands free, the value of hands-free tablet use will multiply.

Unlike the Hal 9000, tablets don’t provide an unblinking red light in response to your queries. They provide whatever best satisfies your need. After all, a voice can provide words in response to what you want, but a picture speaks… Well, you get the idea.

It seems clear that we are still at the very  beginning stages in the world of voice controlled computing and significant barriers stand in the way of widespread adoption – standards of interaction for needs beyond straight searching and integrating voice commands into popular software and applications are just two. But it also seems clear that voice controlled interactions will help to multiply the use cases tablet computers can satisfy at home and in the office. And solutions for positioning tablets for hands-free use across those use cases will become more and more valuable.