As anyone knows who uses an AI voice assistant, such Alexa, Siri, Cortana or OK Google, the wake word isn’t the only thing that triggers a response. For example, in the case of Amazon’s Echo, the phrase “my pants on” has a high chance of waking the device, which risks accidentally recording the conversation and sharing it with Amazon. This feature has raised privacy concerns: are assistants mistakingly recording conversations?
New research suggests that, while these assistants aren’t listening all the time, they are frequently triggered and will “wake up” with words other than their programmed wake word. This means that users’ privacy is compromised because the assistant can record conversations that users did not intend to have captured. These conversations may then be passed on to contractors who are used to further train the assistant.
The researchers used shows from Netflix that contain a significant amount of dialogue as the conversational source. They tested a variety of smart speakers from Google, Apple, Amazon and Microsoft (Harman Kardon Invoke).
All the devices woke frequently for short intervals, with Apple HomePod and Microsoft Cortana waking the most often. Interestingly, most activations are not consistent, which means that there could be some randomness to wake word recognition or that the AI learns from previous mistakes and changes how it detects wake words.
The Office and Gilmore Girls caused much higher rates of wake up than other shows. These shows also had more dialogue with respect to the others. This may mean that waking is linked to the amount of dialogue.
All shows caused devices to wake. So if you have your TV on, at some point your assistant will wake without your direction. Whether or not these activations are long enough to record sensitive information depends on the device – Amazon Echo Dot 2nd Generation and Microsoft Invoke had the longest activations, at up to 43 seconds.
Finally, because each assistant has a different set of wake words, there are patterns in the non-wake word errors.
For Amazon devices, with the wake word (in bold):
- Amazon: words containing combinations of “I’m,” “my” or “az,” such as “my pants on” or “I was on.”
- Alexa: words that contain a “k” and sound similar to “Alexa,” such as “exclamation.”
- Echo: words containing a vowel plus “k” or “g” sounds, such as “that cool” and “back to.”
- Computer: words containing “co” or “go” followed by a nasal sound, such as “cotton” or “cash transfers.”
For Google devices, with the wake word Hey Google, words rhyming with “hey” (such as letter “A” or “They”) followed by something that starts with hard “G”, or that contains “ol” such as “cold and told,” such as “Okay, and what”, “I can work”, “What kind of”, “Okay, but not.”
For Apple devices, with the wake word Hey Siri, words rhyming with Hi or Hey, followed by something that starts with S+vowel, or when a word includes a syllable that rhymes with “ri” in Siri, such as “They very”, “Hey sorry”, “Okay, Yeah” and “And seriously.”
For Microsoft devices with the wake word Cortana, words starting with “co”, such as “consider” or “coming up.”
This work raises more questions than answers – such as how many activations lead to audio recordings being sent to the cloud versus processed only on the smart speaker? Or do users see all cases of audio recordings? There are questions of bias – how are accents and ethnicity represented?
Many users forget their voice assistants are always sitting alongside them. But it’s one thing to forget when it’s your device; it’s another when it’s someone else’s. Voice assistant devices are increasingly used in public spaces and aren’t always avoidable. More research is needed on the true nature of privacy risks given the frequency of accidental wake ups.