Intelligent Digital Assistants (IDAs) or voice-activated smart devices such as Amazon’s Echo and Google Home have become an essential part of today’s smart life. We use them in our homes (e.g. online searches, querying about weather, directions, etc) as well as in our offices (e.g. recording meetings) to make our life smarter. It’s a Matter […]
Intelligent Digital Assistants (IDAs) or voice-activated smart devices such as Amazon’s Echo and Google Home have become an essential part of today’s smart life. We use them in our homes (e.g. online searches, querying about weather, directions, etc) as well as in our offices (e.g. recording meetings) to make our life smarter.
It’s a Matter of Convenience
Indeed, voice technology is sweeping our world and transforming our lives. IDAs and voice-activated televisions (smart TVs) will soon be commonly used in our daily lives. Recent forecasts show that 50% of all searches on the internet will be voice searches by 2020  and there will probably be more digital assistants than humans by 2021 . The research done by J. Walter Thompson and Mindshare  shows that efficiency is the main reason for using voice. It shows that the user’s brain activity is lower when voice is used, as compared to when touch or typing are used, which indicates that voice data is more intuitive than any other means of communication. Current common tasks for regular voice users (i.e. those who use voice services at least once a week) are “online searches, finding information about a specific product, asking for directions, asking questions, finding information about a specific brand or company, playing music, checking travel information, setting alarms, checking news headlines and home management tasks” .
The Right to Privacy.
Privacy issues in technology were first raised as far back as 1890 by two legal scholars in possibly the most influential privacy article, “The Right To Privacy”, where they examined whether existing laws at the time protected the individual’s privacy . They wrote the article mainly in response to the rise of the ”snapshot” and its subsequent use in taking photos of people secretly or without their consent. They wrote “Instantaneous photographs and newspaper enterprise have invaded the sacred precincts of private and domestic life,”. “The Right To Privacy” article is considered as the main foundation of American privacy laws  and since its publication, privacy laws have been passed in some US states to protect individuals. Today, after more than 130 years, drones embedded with cameras, allow anyone to spy from above and new privacy laws are being passed in the US to limit and govern their use .
Today’s technology is affecting the privacy of individuals on a daily basis, through the use of smartphones and social media: photos captured by smartphones are shared in social media websites making them susceptible to breach by hackers. In addition to these privacy concerns about photos shared in the cloud, the rising use of cloud-based voice recognition systems such as IDAs and smart TVs has added another layer of privacy issues, sneaking up on people right inside their homes.
For many, there is a belief that there is a “magic pipe” that exists between their Alexa-type device, and the ultimate provider of information, very much like typing text into a browser, and getting a webpage direct from, say, a weather website.
The main privacy problem with voice is that the voice data is processed online at the cloud which enables the cloud to record and store voice data. This makes data vulnerable to breaches from external hackers as well as from the cloud server itself. In fact the cloud provider acts as the conduit of all information to and from the consumer, which could include sensitive financial and health information. The SSL “padlock” that we see against many websites, protecting data-in-transit, has no equivalent in the voice activated world.
What Risks can Voice Really Present?
Voice adds an extra layer of potential privacy intrusion over Plain Old Text communications. The recent progress in voice forensics driven by modern advancements in AI speech processing systems by researchers from institutions such as Carnegie Mellon University can profile speakers from their voice data: they can estimate the speaker’s bio-relevant parameters (e.g. height, weight, age, physical and mental health) as well as their environmental parameters (e.g. location of the speaker and the surrounding objects). These research findings have been recently applied to help the US Coast Guard to identify hoax callers . This shows the amount of information that can be leaked about speakers when their recordings are breached by hackers, or even where they are used for data mining by cloud voice providers.
So, online speech recognition leads to privacy issues not only because the cloud server will know the speaker’s transcribed text but also because voice data reveals the speaker’s emotions (e.g. joy, sorrow, anger, surprise, etc) and the speaker’s biological features. Voice data contains biometric data that might be used to identify the speaker. In fact, applications for speaker verification (used for authentication purposes) and speaker identification (used to identify a speaker from a set of individuals) are currently being deployed or are already in use in banking and other sectors.
In , it has been reported that recent patents by Amazon and Google about use cases of their digital assistants, Echo and Home respectively, reveal privacy problems that could affect smart home owners. In particular, “a troubling patent”, as noted in , describes the use of security cameras embedded in smart devices (e.g IPA, see Fig. 1) to send video shots to identify a user’s “gender, age, fashion-taste, style, mood, known languages, preferred activities, and so forth.” .
Fig. 1. Consent vs Amazon’s Echo Look and Google Home Mini
Recently, there has been rise in concern about privacy among the users’ of Amazon Echo and Google Home as shown in a recent paper  analysing online user reviews. Apparently Amazon’s Echo got bad reviews mostly concerned about privacy after being used as a testimony in a US court to judge a murder case in Arkansas . The paper shows also that Google Home reviews were not affected by the news warning that they are always listening without being activated . Of course, these devices need to be listening in order to detect their activation keywords (e.g. “Alexa” or “OK Google”) but they should not be recording anything before they spot their activation keywords.
General Data Protection Regulation (GDPR) vs Voice Data.
The EU GDPR , enforced on May 25th, defines Biometric data as follows “personal data resulting from specific technical processing relating to the physical, physiological or behavioral characteristics of a natural person, which allows or confirms the unique identification of that natural person, such as facial images or dactyloscopic data”. So GDPR categorises biometric data as sensitive personal data. Personal sensitive data needs to be protected and its processing can be done with consent or in certain cases where it is necessary. In particular, speakers’ voice data is related to their physical, physiological and behavioral characteristics as mentioned above.
Therefore, voice data as well as all other forms of data need to be protected when outsourced to the cloud, and any subsequent processing should be done with consent. Otherwise, if data is breached by hackers, un-protected breached data can be exploited with severe consequences of the type mentioned above.
Achieving Privacy in Voice-activated Applications.
Fortunately, there are some solutions that allow us to enjoy the use of IDAs whilst at the same time achieve some measure of privacy. One possible solution is an on-device speech recognition system combined with searchable encryption [3, 2, 1] which is one of the practical methods to perform secure search on encrypted data. An alternative is to have on-device speech recognition as on-device intent matching, eliminating the need to have any cloud intermediary.
In this case the IDA device could be the user’s smartphone, laptop or desktop computer. The on-device solution allows us to avoid the data-in-use protection needed when performing computation in the cloud. It is more suitable for IDAs since they normally processes short-duration voice data in real time.
Performing speech recognition offline at the client side rather than at the cloud side means that at a minimum, the corresponding transcription hides the speakers’ biological and environmental voice features noted above, and only reveals the transcribed texts to the cloud server to enable the server to respond to the speakers’ queries.
The cloud server will use a search engine or any other convenient method to respond to queries depending on dynamic data such as news headlines, weather forecasts, travel information, shopping, etc. However, some very private tasks can be done locally at the user side without using a cloud server such as making phone calls, home management and calendar management.
Our on-device solution can also perform generic speech recognition to transcribe recorded office meetings or recorded customer service calls for example. Privacy and security concerns aside, the prospect of outsourcing data storage to the cloud is attractive for a number of reasons. With professional cloud hosting comes robust backup services, unlimited capacity and essentially it is cheap and more convenient than maintaining on-premise in-house databases. If stored data is always encrypted on the cloud then many concerns disappear, since encrypted data can still be searched, with state of the art searchable encryption techniques. This enables users to perform search when needed on their encrypted data stored at the cloud without costly download-decrypt-re-upload protocols. Third party queries, for example, such as the ones required by court in the Alexa murder case, could be privately issued through the use of multi-client searchable encryption schemes [17, 3] where the data owner (i.e. the user who recorded the meeting or conference call) only writes the encrypted data and gives access to queries to an authorized third party (e.g. court) according to a policy agreement between the data owner and the third party. The cloud server storing the encrypted audio data will not be able to know the encrypted queries or the encrypted audio data because it does not have the data owner’s secret keys. It will only be able to learn whether two encrypted queries are the same or not but will never ‘see’ the actual plaintext queries.
Path of most resistance
Whilst these cryptographic approaches are exciting, they represent a threat to the current order. Google, Apple and Amazon are all building business models that insert themselves in the transaction loop between consumer and brand.
“Alexa, get me a taxi to the airport” represents a major source of potential revenue to Amazon, who act as the arbiter of your intent. You want a cheap taxi, so you don’t care if it is Uber, Lyft or a local cab company. The lucky company pays a small commission to Amazon for being chosen. If Amazon acts as the payment provider, that represents a second source of income.
What is required is an in-home device that is powerful enough to provide the cloud power of speech recognition and intent matching to allow consumers to interact directly with the internet, but which is cheap enough that it provides a bulwark against low-cost devices provided by the major providers. The teardown cost reported by ABI Research of the second-generation Echo Dot is $34.87 . It retails at $49.99 for one device, or $40 for 2, and has been seen for as low as $30. Clearly it is being seen as a loss leader for other services.
The question is, in a world where privacy is regularly sacrificed by consumers for access to free services and content, who will blink first, the internet giants who depend on our data to fund their businesses, or the consumers who provide it?
1. R. Bost. Sophos:Forward secure searchable encryption. In CCS 2016
2. David Cash, Stanislaw Jarecki, Charanjit Jutla, Hugo Krawczyk, Marcel-Catalin Rosu, and Michael Steiner. Highly-scalable searchable symmetric encryption with support for boolean queries. In CRYPTO 2013.
3. Reza Curtmola, Juan Garay, Seny Kamara, and Rafail Ostrovsky. Searchable symmetric encryption: improved denitions and ecient constructions. In ACM CCS 2006.
4. Google, Amazon Patent Filings Reveal Digital Home Assistant Privacy Problems http://www.consumerwatchdog.org/sites/default/files/2017-12/Digital%20Assistants%20and%20Privacy.pdf
5. Neil M. Richards. The Puzzle of Brandeis, Privacy, and Speech.
6. Rita Singh, Joseph Keshet and Eduard Hovy. Profiling Hoax Callers. IEEE International Symposium on Technologies for Homeland Security, Boston, May 2016.
7. Lydia Manikonda, Aditya Deotale, Subbarao Kambhampati. What’s up with Privacy?: User Preferences and Privacy Concerns in Intelligent Personal Assistants. AAAI/ACM Conference on Artificial Intelligence, Ethics and Society (AIES) 2018. Url: http://www.public.asu.edu/~lmanikon/lydia-ipaprivacy.pdf
8. Samuel D. Warren and Louis D. Brandeis. The Right To Privacy. Harvard Law Review. Vol. 4, No. 5 (Dec. 15, 1890), pp. 193-220. http://www.jstor.org/stable/1321160?seq=1#page_scan_tab_contents
14. Christi Olson, “Just Say It: The Future of Search is Voice and Personal Digital Assistants,” Campaign, 25 April 2016, bit.ly/2o1IvQs
15. Ovum, “Digital Assistant and Voice AI–Capable Device Forecast : 2016-21,” April 2017
16. GDPR. https://gdpr-info.eu/
17. S. Jarecki, C. Jutla, H. Krawczyk, M. C. Rosu, and M. Steiner. Outsourced symmetric private information retrieval. In ACM CCS 13, Berlin, Germany, Nov. 4–8, 2013. ACM Press
18. Disruptive Asia, “Amazon Echo Dot MkII teardown reveals significant cost reduction effort: ABI,” https://disruptive.asia/amazon-echo-dot-teardown-abi/, January, 2017.
“Are you ready for GDPR?”. “GPDR, 6 steps you *must* take”. “Do you want to go to prison and never see your kids again?” As CTO of a software company, I get a variation of one of these emails every single day, and I strongly suspect I am not alone. The first thing I am […]
“Are you ready for GDPR?”. “GPDR, 6 steps you *must* take”. “Do you want to go to prison and never see your kids again?”
As CTO of a software company, I get a variation of one of these emails every single day, and I strongly suspect I am not alone. The first thing I am going to do when GDPR comes in (28th May), I’m having every single one of the companies who is spamming me thrown into jail. Or can I? What is the hype all about, and how much should you worry?
Yes it affects you. Even Americans, so read on.
GDPR, for readers outside of the EU, is the General Data Protection Regulation, which passed through the EU Parliament almost two years ago, and it is meant to harmonise, or possibly re-harmonise, data protection legislation across the EU. And, of course, it affects anyone from outside the EU who trades in EU.
The last major overhaul of data protection legislation was in 1995, under the Data Protection Directive, which sought to control how organisations used personal data, like telephone numbers, addresses etc. This meant that EU citizens were able to access what data was held on them by organisations (in the UK for a modest fee of £10), and to put in place a regulatory framework of what could be done with that data, e.g. could it be sold to third parties.
Since then, the world has changed, with the explosion of the Internet and cloud services: and here is the hidden danger of GDPR. If you are using cloud services, you need to know where your data is being held. Actually, you should always have known that, but when GDPR kicks in, the potential fines for non-compliance are huge, up to 4% of annual global turnover or €20 Million (whichever is greater). As The Register pointed out last year, 2016’s fines levied by the UK regulator (the Information Commissioner) would have risen 79 times.
Some companies, like Salesforce, have taken a very non-technical approach to some of their GDPR issues. Rather than ensuring that data is properly siloed and encrypted by geography, they have cut people off certain services, such as Salesforce IQ.
But what should you actually be doing to support your GDPR effort that you are not doing already?
Well, if you previously did business involving EU citizens and held their data, it was ambiguous as to whether you were affected by the EU Data Protection Directive. Well, as of May, that ambiguity goes, so if you are processing the personal data of an EU citizen, you must appoint a representative in the EU, and abide by the terms of GDPR.
It used to be easy to obfuscate your terms and conditions to obtain people’s consent to harvest and misuse their data. No more: consent must be clear and unambiguous. That has lead to some pretty interesting discussions. Twice this week, I have heard people say that data which has been obtained for “quality and training” purposes cannot be used for machine learning, because “you have to ask for specific consent for ‘machine learning’”. I think the world has gone mad. One of the ways we improve quality (and training) will be through the use of neural networks.
Hype and hysteria.
What a Difference Three Days Makes: 72 Little Hours.
If you think that there has been a data breach that is likely to “result in a risk for the rights and freedoms of individuals”, you have 72 hours to notify the breach and let your customers know. It will be interesting to see how the courts and regulators interpret this. You can see how a breach that leaks passwords is important, but what about names and addresses, data which is easy to obtain in any event?
Denial of Service Attack Access Rights
With GDPR comes new and shiny access rights. The biggest shift? You can get the personal data held on you for free, so the bar for the human DOS attack becomes much lower. What am I talking about?
Well, in my world, for example, we help people capture and monitor phone calls. Imagine if 10,000 people all at once contacted a large bank and said they had called into a call centre over a period of three weeks, two months ago.
And they want
Or the “Right to be Forgotten”, another new right. You can ask any organisation to delete your data, and they must comply.
Or they want
The right to have all their data provided in a ‘commonly used and machine-readable format’
Sounds easy, right?
First off, you only have a month to get the data back, or in exceptional circumstance three months (if, as the UK regulator puts it, “requests are complex or numerous”)
And then you have to identify it.
Voice is the hidden problem in any organisation. If you store it, even just voicemails, you must be able to label and retrieve it. You might think it is as easy as matching up a phone number. Not so. At any time, 5 people in my house could use the same landline. In my office, up to 25 people share the same external number. If I have a conference call, there could be all sorts of people on it. How the hell do I work out who is who? And if I Skype a telephone number? Quite often there is no Caller ID at all.
And what if 10,000 people asked the same question at the same time?
There are simple steps, obviously, like trying to capture the names and details of people who call in and store it against the voice record. In some cases that will work, but not for my conference call, or my casual enquiry to the bank (especially if I don’t want to give my name). In highly regulated environments like trading floors, every call is recorded, but at the moment, the metadata is frequently in a mess, and calls are just labelled with the name of the institution that called, or worse, nothing at all.
What I would do?
Set up a biometric database of people who call in (what people call a voiceprint). They are not fool proof, especially for authentication as the BBC demonstrated last year, by hacking HSBC, but they serve as a useful backstop to try to find people who may be trying very hard, and somewhat maliciously, not to be found.
GDPR does not end there. You need to ensure that your data storage is designed with privacy in mind, so ensuring proper access controls over data, and encrypting data at rest and in transit. People must be trained to understand the importance of data protection, and you need to have clear and defined policies in place.
Hype or not?
GDPR undoubtedly throws up new hurdles for businesses, but the real extent of that will only be found out as authorities start to enforce the regulations. Will they really use the maximum fines? And will it help? We have seen a steep rise in compliance for major banks in the wake of the massive fines levied by regulators in the wake of LIBOR, FX and other scandals. But those were multi-billion-dollar fines. The largest ever fine in the UK to date is a mere £400,000 ($560,000).
Castel Detect™, Castel’s call monitoring software, is successfully monitoring 15 million hours of telephone calls every year using its proprietary on-premise solution. Today it is announcing the availability of the same rich functionality as a cloud-based offering. Powered by Intelligent Voice®, Castel Detect™ delivers fast and accurate word and phrase detection for customer/agent conversation monitoring […]
Castel Detect™, Castel’s call monitoring software, is successfully monitoring 15 million hours of telephone calls every year using its proprietary on-premise solution. Today it is announcing the availability of the same rich functionality as a cloud-based offering.
Powered by Intelligent Voice®, Castel Detect™ delivers fast and accurate word and phrase detection for customer/agent conversation monitoring across a wide variety of industries. Telephone call compliance and monitoring is becoming increasingly important across a wide range of industries, from call centres to law enforcement and prisons.
Intelligent Voice’s speech recognition engine based around NVIDIA® GPU technology, leverages the massively parallel world of CUDA programming to give blisteringly fast ASR (Automatic Speech Recognition) across large data sets.
“Our continued partnership with Castel, dedicated to deliver best in-class speech analytics capabilities to contact centres across the world has gone from strength to strength with Castel’s agents, using Intelligent Voice’s powerful GPU powered software, taking over 240000 monitored calls each day on premise. It made sense to enable the same speed and accuracy running in the cloud.”
Nigel Cannings, CTO Intelligent Voice
Cloud services, being well known as an inexpensive alternative to on-premise, opens the opportunities and capabilities to additional customers that otherwise may not have the infrastructure to run call monitoring on premise.
About Intelligent Voice®
Intelligent Voice Limited is based in London, San Francisco and New York. The company has over 25 years’ experience of delivering mission critical systems in the financial services industry, including to several of the world’s top 20 insurers and banks. Through innovations such as the SmartTranscript® and GPU-accelerated speech recognition, Intelligent Voice allow companies to understand their businesses better, with a key focus on unlocking the value in telephone and meeting room audio. For further information about Intelligent Voice, please visit www.intelligentvoice.com
Founded in 1982, Castel designs call center software, services and solutions engineered for businesses. Castel listens, learns, plans and partners with companies to define and realize their future. For more information, news and perspectives from Castel, please visit Castel Newsroom at http://www.castel.com/news/.
Alexa, what’s my bank balance? The 2018 state of voice. All of a sudden voice assistants are everywhere. In our phones, cars, TVs, microwaves and refrigerators. If you don’t have at least one Amazon Echo, Google Home or Apple HomePod in your house at this point you might be in the minority: voice assistants have […]
Alexa, what’s my bank balance? The 2018 state of voice.
All of a sudden voice assistants are everywhere. In our phones, cars, TVs, microwaves and refrigerators.
If you don’t have at least one Amazon Echo, Google Home or Apple HomePod in your house at this point you might be in the minority: voice assistants have moved into our everyday lives in a big way, and they’re the new norm.
Given the rate of adoption, and the expansion of voice APIs for the masses, we thought it was time to look at the market, how it’s growing and where voice is headed next. 2018 is officially the year of the voice assistant.
William Dersch’s Shoebox listened as the operator spoke numbers and commands such as“Five plus three plus eight plus six plus four minus nine, total,” and would print out the correct answer: 17.
Since that demo, humans have dreamed of interacting with their devices in a more natural way for decades, but it always felt a little far off. Science fiction, like Star Trek, 2001: A Space Odyssey and Back to the Future 2, gave us visions of the future where we’d interact with the digital world by just speaking aloud — but it always seemed like nothing more than a fantasy.
There have been various attempts at building rich voice experiences many, many times, and you likely recall those from the 1990’s best. Those tools required you to sit in front of a computer and dictate for hours to train it before use, and even then it remained unreliable at best.
The real innovations that pushed voice forward to where we are now aren’t entirely obvious: cloud computing, and machine learning. Neither ideas were particularly new, but the way they were embraced changed everything.
If you wanted to build a voice assistant in 1996, you’d need vast server rooms of your own to perform basic interpretation — which required massive amounts of investment. In 2018, it’s as easy as clicking a few buttons on Amazon Web Services and poof you’ve got a massive, high-performance data-center ready to go.
Cloud computing has revolutionized the way applications and ideas are built: before, you’d need at least some metal to run your voice service on, but now you can build a vast service without ever actually seeing a server.
Machine learning alongside cloud computing created a potent combination: suddenly developers had access to vast amounts of processing power to experiment with teaching a computer how to think — and we had larger data sets to feed them.
The theory behind machine learning has been around since at least the 1980’s. Dr Hermann Hauser, scientist and director of Amadeus Capital, said in a presentation that much of the ideas used by modern machine learning were invented decades ago, but the raw power wasn’t available to do anything with them.
Equipped with an ability to grasp basic concepts, voice was inevitable for computers. Siri, which was released in 2014, was likely the first ‘modern’ voice experience consumers had — and while it was impressive, it was obvious that the technology was nowhere near usable on an everyday basis yet.
While Siri was a great early demonstration of what voice assistants could do, it was easy to stump it. Basic commands worked, but as soon as you asked it something unexpected — which happens as soon as humans feel comfortable — it would become stumped. Ultimately, the problem was that Siri wasn’t able to learn from its own mistakes until much later, in 2014.
It wasn’t until Amazon unveiled the Echo in 2014 that anyone started paying serious attention to voice again. It was by this point neural networks were beginning to find their way into consumer applications, and into the public eye — and it showed in the first reviews of Echo:
“Yet this is the future, I’m sure of it. Several times a day, the Echo blows me away with how well it converses, and how natural it feels to interact with a machine this way.”
Echo wasn’t just impressive because it was the first device on the market that made voice feel really natural, but also because of its hardware: the company combined far-field microphones, a decent speaker and made it look good.
Far field microphones in 2015 were a concept not many people were familiar with. The technology allows a device to combine microphones to increase the range in which it’s able to hear a voice, and block out noises around them. Combined with audio processing improvements, it’s a potent technological leap: suddenly computers could hear and understand, almost anywhere in a room with a satisfying level of precision.
The Echo came out of nowhere, at least to the consumer, and a whole new model of interaction was born overnight because Amazon was able to stand at the crux of three massive innovations intersecting with one another — it also, conveniently, runs the world’s largest cloud computing platform.
Modern voice assistants became possible because their makers were able to offload that heavy data-crunching required for interpretation of voice to their cloud brains. All your smart speaker does is listen for the hot word OK Google or Hey Alexa, which opens the pipe to their online brains for real-time recognition.
Almost nothing is done locally by these devices, bringing prices down, and making them possible to build in attractive, fabric-coated form factors for your kitchen.
The current state of voice
Google Home Device – Current State of Voice
With these developments in mind, let’s look at where we are in 2018 from the consumer’s perspective: voice went from a cute tool, to a primary mode of interaction for the home. For the first time, people are comfortable — and even prefer — to use voice for interacting with digital devices.
This has been driven by aggressive competition between Google and Amazon. Echo was first to market, leaving Google reeling, and ultimately leading to the company investing billions in Home to build out what it sees as the next platform for search. If anything, Amazon Echo was the company’s first real existential threat, making Home all the more important.
As a result, we see a huge race to the bottom for voice, because it’s winner takes all.
What started out as Amazon Echo is now a multitude of products, including the smaller Echo Dot and the larger Echo premium speaker. Google has done the same, going down-market with Home Mini, and up-market with Home Max, which competes with Sonos and beyond. Apple is about to enter the game for the first time with the HomePod, which is set to ship in February.
Consumer Electronics Show was the first visceral evidence of how much this space is worth to those fighting for a spot on your bench:
“The words“Hey, Google” are currently plastered along the outside of the city’s public transportation system(the Las Vegas Monorail) that will shuttle thousands of attendees into the conference center all week. It’s a bold statement from the Mountain View, Calif.-based company, and makes one thing clear to all attendees at CES: Google wants you to get used to interacting with its digital assistant.”
All of the players in the voice space are pouring millions into it because, ultimately, they must. Google discounted Home Mini by more than half over the holidays, Amazon essentially gave Echo Dot away for free. For lower-end devices, they’re a gateway drug into the entire ecosystem: you’re almost guaranteed to expand later, so it’s not a big deal to sell at a loss.
If any one of these assistants ‘wins’ it means millions of people who will turn to that device, every day, before any other interaction model. These devices become the gateway to your home, as Internet of Things devices become prevalent, because they’re a natural way to interact with gadgets sans the need to pull out your phone.
They also vacuum up data at an unprecedented scale.
Google and Amazon are fighting over this space because it’s a fantastic, friendly vehicle for capturing data — the new gold. By becoming intimate with you to the point you turn to your voice assistant first, before your phone, these companies start getting closer to understanding your thoughts, and ultimately, your intent.
Almost everything you say to Alexa and Home is crunched, and stored, for later. That voice data is a goldmine for both companies because they’re able to use it both to train future algorithms, but also figure out how to get you to buy stuff.
Once you’re comfortable with voice, it gets even more interesting from there. The biggest advantage these devices have is they can make decisions on your behalf, while profiting from it, without your knowledge.
Here’s a theoretical example: imagine you’re planning to take an Uber to the office. When you ask Echo for a ‘ride to work’ it could, eventually, sell that term to the highest bidder and send whoever it feels like. Why would it default to Uber, if it’s not paying money? J
Just as Amazon did for the marketplace, thousands of brands will see their value diminished in a voice world, because assistants become the ultimate gatekeepers. Amazon, Google and Apple will decide who gets in front of you, and who doesn’t — and you probably won’t ever know.
Voice assistants are about to be everywhere. You probably have one sitting in the room you’re in now. But are we ready for this?
Privacy and your voice
Apple Home Device – Where to with Privacy
The biggest challenge in voice is one that the biggest players aren’t really talking about: privacy.
Both Amazon and Google store recordings of your voice as you use their devices, and both companies are able to decrypt those recordings to perform analysis, ultimately creating the world’s biggest voice database.
In our rush to voice assistants, we’ve forgotten the importance of privacy, and what having this data at scale means in the future. While all of these improvements have begun happening, it’s become near trivial to recreate someone’s entire voice using a computer and a handful of snippets. If that’s not terrifying, I don’t know what is.
There are additional privacy implications as well. Due to the nature of how your voice is processed: we’re wiring hundreds of pieces of metadata up to the cloud, like our bank accounts, to use them with Alexa and Home, without really considering it.
As developers have rushed to enable the next big consumer experience, they’ve fallen over themselves to get experiences in your hands.
Alexa, what’s my bank balance is a real command, available from multiple banks. It’s a legitimately useful use case for the user, but it’s also a great way for Amazon to figure out how much money you have on hand, and an even better way for an attacker to find out more information about your bank account.
“Dr Rita Singh from Carnegie Mellon University and her colleagues pieced together a profile of a serial US Coastguard prank caller solely from recordings of his voice. This included a prediction of his height and weight, and also the size of room he was calling from, leading to his apprehension by the authorities. Dr Singh’s team are using this research to identify a person’s use of intoxicants or other substances, and also the onset of various medical conditions the speaker may not even be aware they possess.”
The only major voice player to advertise itself as encrypting your voice, identity and any associated data is Apple. As with Siri on the iPhone, Apple advertises HomePod as a privacy-focused device:
Only after“Hey Siri” is recognized locally on the device will any information be sent to Apple servers, encrypted and sent using an anonymous Siri identifier.”
In other words, Apple won’t know who you are, and won’t be able to do much more with that data once it’s left your home. That claim, however, doesn’t paint the complete picture: because Apple doesn’t process locally, your voiceprint is still in the cloud, and they could almost certainly link it back to you if they were forced to.
The practices Apple uses add a layer of security, but don’t solve the problem — your data, and voice, now live in a cloud somewhere. Eventually, if Apple wants to move beyond relying on a local iPhone to process integrations, it’ll need to associate that data somehow and likely backpedal those claims in order to provide a connected experience.
So, what about the competition? Amazon doesn’t detail what it does with Alexa, but Google, for its part, says it encrypts data, but it’s also the one holding the keys. As a result, we don’t really know how far that promise of ‘encryption’ truly extends:
“Your security comes first in everything we do. If your data is not secure, it is not private. That is why we make sure that Google services are protected by one of the world’s most advanced security infrastructures. Conversations in Google Home are encrypted by default.”
Siri, which has improved in recent years, is clearly behind in the voice assistant race as a result of this data access: it’s still unable to infer basic human ways of interacting with information, such as saying “where is that?” after asking “What’s a great taco spot nearby?”
If you had told people just a few years ago that you were going to place an always-on microphone in their home, they’d have balked, and refused. Now, it’s increasingly common, and people don’t seem to be concerned about the impact of that on their privacy — but Apple’s bet is that they will.
What remains to be seen is if Apple’s bet on that privacy will matter. While Apple is just taking its first steps with HomePod, Amazon and Google are busy putting their assistants in everything from cars to microwaves.
Soon, every device around you might be listening. Are you ready for that?
Where to from here?
Where to from here?
Voice is the new interface, and isn’t going away anytime soon. For years, we’ve chased interacting with our computers in a more natural way, and the floodgates are open. So what next?
Privacy is the final frontier, and it’ll be a huge trend throughout 2018 relating to voice assistants. GDPR, the European Union’s biggest piece of new legislation in decades may drive that conversation forward, as it raises many questions about whether or not smart voice applications can be compatible with strong privacy law at all.
Companies will now have to ask for consent in simple terms, rather than buried in legalese terms and conditions. This creates many challenges, in particular for cloud-based voice assistants. Voice is considered to be personal data, therefore devices that listen ambiently should in theory ask everyone in the room for consent before sending their voice to the cloud.
Imagine the nightmare of having 10 people over for dinner, and having your Google Home device asking each of them for consent!”
Over the coming year it’s likely the question of voice assistants, consent and voice security, will become a large part of the discussion. With GDPR, citizens of the EU will have the right to know where, and when their data is being used — as well as requiring their consent for expanded use of that stored data. It doesn’t matter if you’re building an experience from the US for EU customers: you’re still bound by the same rules.
Right now, most APIs for voice recognition are cloud-based, provided by Amazon and Google. This presents challenges for businesses looking to build experiences for their own apps with privacy in mind, especially with GDPR in the picture.
Local-only APIs, and on-premise solutions do exist, and may be worth considering as these concerns become even more important throughout 2018. Your customers may demand the peace of mind, and guaranteeing a level of predictable privacy is good business.
With Google, in particular, focusing almost all of its energy on voice as the next frontier for search, these questions are going to become more paramount. If we’re to imagine a future in which we’re talking to computers all day, like in the movie Her, we need to understand what happens with our voice once it leaves the room and goes online.
Its clear that voice is here to stay, and we’ll need to get comfortable with that reality for the foreseeable future. Privacy, especially when it comes to voice, is paramount, and the question really is wide open with consumer voice: where is the line?
No cloud server or messaging system is completely secure: Just ask Hillary Clinton. Even though these systems are protected with layers of security, these layers can be hacked. Brute force attacks can crack passwords. MITM attacks using tools like sslstrip can turn secure sessions into insecure HTTP sessions. And outright manipulation of human confidence can […]
No cloud server or messaging system is completely secure: Just ask Hillary Clinton. Even though these systems are protected with layers of security, these layers can be hacked. Brute force attacks can crack passwords. MITM attacks using tools like sslstrip can turn secure sessions into insecure HTTP sessions. And outright manipulation of human confidence can be used to access virtually anything.
This is why homomorphic encryption is on the brink of becoming popular in cloud computing, especially when only 25% of people trust cloud providers with their data.
With homomorphic encryption, a cloud server can’t see the original content of a file. Instead of the original content being stored, a scrambled version of it is stored. And using homomorphic encryption, everything from plaintext to audio snippets can be stored, searched for, and located on the cloud server without the cloud server company seeing it (explained visually below).
For instance, if you are a doctor who has dictated sensitive patient data (as hundreds of thousands of medical professionals do every day), you could send the recording to a homomorphic speech service, then search the audio file for specific keywords. Without understanding the content of the recording, the service could locate parts of the recording with those keywords and send them back to you.
Currently, most practices send audio reports to medical transcriptionists, which is hardly secure, especially if the transcription service is outsourced and not kept in-house. At the end of the day, computers are less emotional and, therefore, more reliable with information than humans.
How files are securely stored and searched for on cloud servers
At Intelligent Voice we take emails, phone calls and other communication and put them through a powerful, AI-driven analytics engine. This helps companies see what kind of conversations their team is having with customers, among other things.
The results from this, including transcripts of video files and phone calls, can now be stored securely using homomorphic encryption on cloud servers.
We can search encrypted audio transcripts without ever decrypting them. The cloud server never sees them in plaintext form and privacy is assured.
Below we’ll go over how this works with an audio file. However, the approach is the same for files that are already in plaintext.
Architecture of homomorphic-based encrypted phonetic-string-search
We reduce an audio or text file into symbols (which could be phonetically based). These symbols are the “content” that’s indexed on our cloud servers.
The encrypted audio and symbols are uploaded to the cloud and added to an encrypted index.
When a search for a file is initiated, the search term is encrypted using our algorithms to find matching symbols. Relevant files and file portions are returned.
Light blue: Encrypt Audio File
Blue: Cloud Server
Green: Turn Audio into phonetic symbols and encrypt
Yellow: Homomorphic representation of phonemes
Red: Client-side search preparation
Purple: Encrypted results returned
AES encryption: A very powerful “symmetric” encryption technique ie the key used to encrypt is the same as the key used to decrypt
Phonetic Encoder: A process for turning speech into smaller sound-based representations
Phonetic symbols: A sound-based representation of the human voice, like a “sound alphabet”
S2S P2G: A method for converting text into equivalent phonetic symbols
Trapdoor: A mathematical function that is easy to compute in one direction, but very difficult to reverse engineer from just the answer
This symbol approach is important (and patent pending) because it reduces “search space.” Technologists have found that if you search for words using this approach, it’s painfully slow because of the processing power required. You might be trying to find over a million possible combinations.
However, if we take a word or phrase and reduce it to symbols — homomorphic HH AO MX AH MX AO RX FX IH KX, for instance — there are only dozens of available symbols. So we index these instead, across voice or text, and the search space is reduced from millions to dozens of units. Instead of looking for collections of matching words, we’re looking for matching streams.
Homomorphic encryption protects your identity — not just your content
Take a banking institution for instance. While the customer service representative is asking you questions about your social security number and where you live, voice print recognition software could be working in the background for enhanced security. It would identify characteristics of your voice like pronunciation, emphasis, accent, and talking speed.
Currently, it’s harder for someone to steal someone’s unique voiceprint than it is to steal information like social security and account numbers. But it’s not impossible. A hacker could easily hack a third-party cloud server that has your voiceprint and use voice mimicking software to hack your financial accounts.
The recent CloudPets hack shows just how easy this is. Using homomorphically encrypted and stored audio would significantly increase the security and privacy of this data
Even though homomorphic encryption was discovered decades ago, there’s only recently been enough computer processing power to make homomorphic storage and search practical. Before, it would take hours or days to do what now takes seconds.
This is good news for cloud service providers, because even though cloud servers can be hacked, it won’t matter as much if they and their customers are using homomorphic encryption to increase the overall security and privacy of their data: If the cloud has never had a “plain” version of the original data, the hacked data remains encrypted and inaccessible.