When Speakers Are All Ears: Characterizing Misactivations of IoT Smart Speakers

Abstract Internet-connected voice-controlled speakers, also known as smart speakers, are increasingly popular due to their convenience for everyday tasks such as asking about the weather forecast or playing music. However, such convenience comes with privacy risks: smart speakers need to constantly listen in order to activate when the “wake word” is spoken, and are known to transmit audio from their environment and record it on cloud servers. In particular, this paper focuses on the privacy risk from smart speaker misactivations, i.e., when they activate, transmit, and/or record audio from their environment when the wake word is not spoken. To enable repeatable, scalable experiments for exposing smart speakers to conversations that do not contain wake words, we turn to playing audio from popular TV shows from diverse genres. After playing two rounds of 134 hours of content from 12 TV shows near popular smart speakers in both the US and in the UK, we observed cases of 0.95 misactivations per hour, or 1.43 times for every 10,000 words spoken, with some devices having 10% of their misactivation durations lasting at least 10 seconds. We characterize the sources of such misactivations and their implications for consumers, and discuss potential mitigations.

[1]  Deepak Kumar,et al.  Skill Squatting Attacks on Amazon Alexa , 2018, USENIX Security Symposium.

[2]  Nick Feamster,et al.  IoT Inspector: Crowdsourcing Labeled Network Traffic from Smart Home Devices at Scale , 2019, ArXiv.

[3]  Leila Bowman,et al.  in the office , 1961 .

[4]  Christo Wilson,et al.  Panoptispy: Characterizing Audio and Video Exfiltration from Android Applications , 2018, Proc. Priv. Enhancing Technol..

[5]  Florian Schaub,et al.  Listen Only When Spoken To: Interpersonal Communication Cues as Smart Speaker Privacy Controls , 2020, Proc. Priv. Enhancing Technol..

[6]  Nick Feamster,et al.  Spying on the Smart Home: Privacy Attacks and Defenses on Encrypted IoT Traffic , 2017, ArXiv.

[7]  Let's Go for a Drive! , 2012 .

[8]  Hamed Haddadi,et al.  Information Exposure From Consumer IoT Devices: A Multidimensional, Network-Informed Measurement Approach , 2019, Internet Measurement Conference.

[9]  Ben Y. Zhao,et al.  Wearable Microphone Jamming , 2020, CHI.

[10]  Matthew Deaves,et al.  General Data Protection Regulation (GDPR) , 2017 .

[11]  Haipeng Li,et al.  I Can Hear Your Alexa: Voice Command Fingerprinting on Smart Home Speakers , 2019, 2019 IEEE Conference on Communications and Network Security (CNS).

[12]  Yuan Tian,et al.  Understanding and Mitigating the Security Risks of Voice-Controlled Third-Party Skills on Amazon Alexa and Google Home , 2018, ArXiv.

[13]  Ahmad-Reza Sadeghi,et al.  Alexa Lied to Me: Skill-based Man-in-the-Middle Attacks on Virtual Assistants , 2019, AsiaCCS.

[14]  Hamed Haddadi,et al.  Privacy preserving speech analysis using emotion filtering at the edge: poster abstract , 2019, SenSys.

[15]  Hai-Ning Liang,et al.  The Smart2 Speaker Blocker: An Open-Source Privacy Filter for Connected Home Speakers , 2019, ArXiv.

[16]  Comparisons are odious , 1900 .

[17]  Kevin Fu,et al.  Light Commands: Laser-Based Audio Injection Attacks on Voice-Controllable Systems , 2020, USENIX Security Symposium.

[18]  Micah Sherr,et al.  Hidden Voice Commands , 2016, USENIX Security Symposium.

[19]  Tatsuya Mori,et al.  Audio Hotspot Attack: An Attack on Voice Assistance Systems Using Directional Sound Beams and its Feasibility , 2019, IEEE Transactions on Emerging Topics in Computing.

[20]  J. Murphy The General Data Protection Regulation (GDPR) , 2018, Irish medical journal.

[21]  Pere Barlet-Ros,et al.  A first look into Alexa's interaction security , 2019, CoNEXT Companion.