Alexa, what’s my bank balance? The 2018 state of voice.
All of a sudden voice assistants are everywhere. In our phones, cars, TVs, microwaves and refrigerators.
If you don’t have at least one Amazon Echo, Google Home or Apple HomePod in your house at this point you might be in the minority: voice assistants have moved into our everyday lives in a big way, and they’re the new norm.
Given the rate of adoption, and the expansion of voice APIs for the masses, we thought it was time to look at the market, how it’s growing and where voice is headed next. 2018 is officially the year of the voice assistant.
William Dersch’s Shoebox listened as the operator spoke numbers and commands such as“Five plus three plus eight plus six plus four minus nine, total,” and would print out the correct answer: 17.
Since that demo, humans have dreamed of interacting with their devices in a more natural way for decades, but it always felt a little far off. Science fiction, like Star Trek, 2001: A Space Odyssey and Back to the Future 2, gave us visions of the future where we’d interact with the digital world by just speaking aloud — but it always seemed like nothing more than a fantasy.
There have been various attempts at building rich voice experiences many, many times, and you likely recall those from the 1990’s best. Those tools required you to sit in front of a computer and dictate for hours to train it before use, and even then it remained unreliable at best.
The real innovations that pushed voice forward to where we are now aren’t entirely obvious: cloud computing, and machine learning. Neither ideas were particularly new, but the way they were embraced changed everything.
If you wanted to build a voice assistant in 1996, you’d need vast server rooms of your own to perform basic interpretation — which required massive amounts of investment. In 2018, it’s as easy as clicking a few buttons on Amazon Web Services and poof you’ve got a massive, high-performance data-center ready to go.
Cloud computing has revolutionized the way applications and ideas are built: before, you’d need at least some metal to run your voice service on, but now you can build a vast service without ever actually seeing a server.
Machine learning alongside cloud computing created a potent combination: suddenly developers had access to vast amounts of processing power to experiment with teaching a computer how to think — and we had larger data sets to feed them.
The theory behind machine learning has been around since at least the 1980’s. Dr Hermann Hauser, scientist and director of Amadeus Capital, said in a presentation that much of the ideas used by modern machine learning were invented decades ago, but the raw power wasn’t available to do anything with them.
Equipped with an ability to grasp basic concepts, voice was inevitable for computers. Siri, which was released in 2014, was likely the first ‘modern’ voice experience consumers had — and while it was impressive, it was obvious that the technology was nowhere near usable on an everyday basis yet.
While Siri was a great early demonstration of what voice assistants could do, it was easy to stump it. Basic commands worked, but as soon as you asked it something unexpected — which happens as soon as humans feel comfortable — it would become stumped. Ultimately, the problem was that Siri wasn’t able to learn from its own mistakes until much later, in 2014.
It wasn’t until Amazon unveiled the Echo in 2014 that anyone started paying serious attention to voice again. It was by this point neural networks were beginning to find their way into consumer applications, and into the public eye — and it showed in the first reviews of Echo:
“Yet this is the future, I’m sure of it. Several times a day, the Echo blows me away with how well it converses, and how natural it feels to interact with a machine this way.”
Echo wasn’t just impressive because it was the first device on the market that made voice feel really natural, but also because of its hardware: the company combined far-field microphones, a decent speaker and made it look good.
Far field microphones in 2015 were a concept not many people were familiar with. The technology allows a device to combine microphones to increase the range in which it’s able to hear a voice, and block out noises around them. Combined with audio processing improvements, it’s a potent technological leap: suddenly computers could hear and understand, almost anywhere in a room with a satisfying level of precision.
The Echo came out of nowhere, at least to the consumer, and a whole new model of interaction was born overnight because Amazon was able to stand at the crux of three massive innovations intersecting with one another — it also, conveniently, runs the world’s largest cloud computing platform.
Modern voice assistants became possible because their makers were able to offload that heavy data-crunching required for interpretation of voice to their cloud brains. All your smart speaker does is listen for the hot word OK Google or Hey Alexa, which opens the pipe to their online brains for real-time recognition.
Almost nothing is done locally by these devices, bringing prices down, and making them possible to build in attractive, fabric-coated form factors for your kitchen.
The current state of voice
Google Home Device – Current State of Voice
With these developments in mind, let’s look at where we are in 2018 from the consumer’s perspective: voice went from a cute tool, to a primary mode of interaction for the home. For the first time, people are comfortable — and even prefer — to use voice for interacting with digital devices.
This has been driven by aggressive competition between Google and Amazon. Echo was first to market, leaving Google reeling, and ultimately leading to the company investing billions in Home to build out what it sees as the next platform for search. If anything, Amazon Echo was the company’s first real existential threat, making Home all the more important.
As a result, we see a huge race to the bottom for voice, because it’s winner takes all.
What started out as Amazon Echo is now a multitude of products, including the smaller Echo Dot and the larger Echo premium speaker. Google has done the same, going down-market with Home Mini, and up-market with Home Max, which competes with Sonos and beyond. Apple is about to enter the game for the first time with the HomePod, which is set to ship in February.
Consumer Electronics Show was the first visceral evidence of how much this space is worth to those fighting for a spot on your bench:
“The words“Hey, Google” are currently plastered along the outside of the city’s public transportation system(the Las Vegas Monorail) that will shuttle thousands of attendees into the conference center all week. It’s a bold statement from the Mountain View, Calif.-based company, and makes one thing clear to all attendees at CES: Google wants you to get used to interacting with its digital assistant.”
All of the players in the voice space are pouring millions into it because, ultimately, they must. Google discounted Home Mini by more than half over the holidays, Amazon essentially gave Echo Dot away for free. For lower-end devices, they’re a gateway drug into the entire ecosystem: you’re almost guaranteed to expand later, so it’s not a big deal to sell at a loss.
If any one of these assistants ‘wins’ it means millions of people who will turn to that device, every day, before any other interaction model. These devices become the gateway to your home, as Internet of Things devices become prevalent, because they’re a natural way to interact with gadgets sans the need to pull out your phone.
They also vacuum up data at an unprecedented scale.
Google and Amazon are fighting over this space because it’s a fantastic, friendly vehicle for capturing data — the new gold. By becoming intimate with you to the point you turn to your voice assistant first, before your phone, these companies start getting closer to understanding your thoughts, and ultimately, your intent.
Almost everything you say to Alexa and Home is crunched, and stored, for later. That voice data is a goldmine for both companies because they’re able to use it both to train future algorithms, but also figure out how to get you to buy stuff.
Once you’re comfortable with voice, it gets even more interesting from there. The biggest advantage these devices have is they can make decisions on your behalf, while profiting from it, without your knowledge.
Here’s a theoretical example: imagine you’re planning to take an Uber to the office. When you ask Echo for a ‘ride to work’ it could, eventually, sell that term to the highest bidder and send whoever it feels like. Why would it default to Uber, if it’s not paying money? J
Just as Amazon did for the marketplace, thousands of brands will see their value diminished in a voice world, because assistants become the ultimate gatekeepers. Amazon, Google and Apple will decide who gets in front of you, and who doesn’t — and you probably won’t ever know.
Voice assistants are about to be everywhere. You probably have one sitting in the room you’re in now. But are we ready for this?
Privacy and your voice
Apple Home Device – Where to with Privacy
The biggest challenge in voice is one that the biggest players aren’t really talking about: privacy.
Both Amazon and Google store recordings of your voice as you use their devices, and both companies are able to decrypt those recordings to perform analysis, ultimately creating the world’s biggest voice database.
In our rush to voice assistants, we’ve forgotten the importance of privacy, and what having this data at scale means in the future. While all of these improvements have begun happening, it’s become near trivial to recreate someone’s entire voice using a computer and a handful of snippets. If that’s not terrifying, I don’t know what is.
There are additional privacy implications as well. Due to the nature of how your voice is processed: we’re wiring hundreds of pieces of metadata up to the cloud, like our bank accounts, to use them with Alexa and Home, without really considering it.
As developers have rushed to enable the next big consumer experience, they’ve fallen over themselves to get experiences in your hands.
Alexa, what’s my bank balance is a real command, available from multiple banks. It’s a legitimately useful use case for the user, but it’s also a great way for Amazon to figure out how much money you have on hand, and an even better way for an attacker to find out more information about your bank account.
“Dr Rita Singh from Carnegie Mellon University and her colleagues pieced together a profile of a serial US Coastguard prank caller solely from recordings of his voice. This included a prediction of his height and weight, and also the size of room he was calling from, leading to his apprehension by the authorities. Dr Singh’s team are using this research to identify a person’s use of intoxicants or other substances, and also the onset of various medical conditions the speaker may not even be aware they possess.”
The only major voice player to advertise itself as encrypting your voice, identity and any associated data is Apple. As with Siri on the iPhone, Apple advertises HomePod as a privacy-focused device:
Only after“Hey Siri” is recognized locally on the device will any information be sent to Apple servers, encrypted and sent using an anonymous Siri identifier.”
In other words, Apple won’t know who you are, and won’t be able to do much more with that data once it’s left your home. That claim, however, doesn’t paint the complete picture: because Apple doesn’t process locally, your voiceprint is still in the cloud, and they could almost certainly link it back to you if they were forced to.
The practices Apple uses add a layer of security, but don’t solve the problem — your data, and voice, now live in a cloud somewhere. Eventually, if Apple wants to move beyond relying on a local iPhone to process integrations, it’ll need to associate that data somehow and likely backpedal those claims in order to provide a connected experience.
So, what about the competition? Amazon doesn’t detail what it does with Alexa, but Google, for its part, says it encrypts data, but it’s also the one holding the keys. As a result, we don’t really know how far that promise of ‘encryption’ truly extends:
“Your security comes first in everything we do. If your data is not secure, it is not private. That is why we make sure that Google services are protected by one of the world’s most advanced security infrastructures. Conversations in Google Home are encrypted by default.”
Siri, which has improved in recent years, is clearly behind in the voice assistant race as a result of this data access: it’s still unable to infer basic human ways of interacting with information, such as saying “where is that?” after asking “What’s a great taco spot nearby?”
If you had told people just a few years ago that you were going to place an always-on microphone in their home, they’d have balked, and refused. Now, it’s increasingly common, and people don’t seem to be concerned about the impact of that on their privacy — but Apple’s bet is that they will.
What remains to be seen is if Apple’s bet on that privacy will matter. While Apple is just taking its first steps with HomePod, Amazon and Google are busy putting their assistants in everything from cars to microwaves.
Soon, every device around you might be listening. Are you ready for that?
Where to from here?
Where to from here?
Voice is the new interface, and isn’t going away anytime soon. For years, we’ve chased interacting with our computers in a more natural way, and the floodgates are open. So what next?
Privacy is the final frontier, and it’ll be a huge trend throughout 2018 relating to voice assistants. GDPR, the European Union’s biggest piece of new legislation in decades may drive that conversation forward, as it raises many questions about whether or not smart voice applications can be compatible with strong privacy law at all.
Companies will now have to ask for consent in simple terms, rather than buried in legalese terms and conditions. This creates many challenges, in particular for cloud-based voice assistants. Voice is considered to be personal data, therefore devices that listen ambiently should in theory ask everyone in the room for consent before sending their voice to the cloud.
Imagine the nightmare of having 10 people over for dinner, and having your Google Home device asking each of them for consent!”
Over the coming year it’s likely the question of voice assistants, consent and voice security, will become a large part of the discussion. With GDPR, citizens of the EU will have the right to know where, and when their data is being used — as well as requiring their consent for expanded use of that stored data. It doesn’t matter if you’re building an experience from the US for EU customers: you’re still bound by the same rules.
Right now, most APIs for voice recognition are cloud-based, provided by Amazon and Google. This presents challenges for businesses looking to build experiences for their own apps with privacy in mind, especially with GDPR in the picture.
Local-only APIs, and on-premise solutions do exist, and may be worth considering as these concerns become even more important throughout 2018. Your customers may demand the peace of mind, and guaranteeing a level of predictable privacy is good business.
With Google, in particular, focusing almost all of its energy on voice as the next frontier for search, these questions are going to become more paramount. If we’re to imagine a future in which we’re talking to computers all day, like in the movie Her, we need to understand what happens with our voice once it leaves the room and goes online.
Its clear that voice is here to stay, and we’ll need to get comfortable with that reality for the foreseeable future. Privacy, especially when it comes to voice, is paramount, and the question really is wide open with consumer voice: where is the line?