I Can Hear Your Alexa: Voice Command Fingerprinting on Smart Home Speakers

Millions of smart home speakers, such as Amazon Echo and Google Home, have been purchased by U.S. consumers. However, the security and privacy of smart home speakers have not been rigorously examined, which raise critical security and privacy concerns. In this paper, we investigate untold and severe privacy leakage of smart home speakers. Specifically, we examine a new passive attack, referred to as voice command fingerprinting attack, on smart home speakers. We demonstrate that a passive attacker, who can only eavesdrop encrypted traffic between a smart home speaker and a cloud server, can infer users' voice commands and compromise the privacy of millions of U.S. consumers. We formulate the attacks by harnessing machine learning algorithms. In addition to leveraging accuracy, we propose a new privacy metric, named semantic distance, to assess the privacy leakage with natural language processing. Our experiment results on a real-world dataset suggest that voice command fingerprinting attacks can correctly infer 33.8% of voice commands by eavesdropping encrypted traffic. Our results also show that existing padding methods can diminish an attacker's accuracy to 14.7%, but would cause high communication overhead (548%) and long time delay (330%).

[1]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[2]  Thomas Ristenpart,et al.  Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail , 2012, 2012 IEEE Symposium on Security and Privacy.

[3]  Romit Roy Choudhury,et al.  Inaudible Voice Commands: The Long-Range Attack and Defense , 2018, NSDI.

[4]  Vitaly Shmatikov,et al.  Beauty and the Burst: Remote Identification of Encrypted Video Streams , 2017, USENIX Security Symposium.

[5]  Wenyuan Xu,et al.  DolphinAttack: Inaudible Voice Commands , 2017, CCS.

[6]  Shuai Li,et al.  Measuring Information Leakage in Website Fingerprinting Attacks and Defenses , 2017, CCS.

[7]  Nick Feamster,et al.  Spying on the Smart Home: Privacy Attacks and Defenses on Encrypted IoT Traffic , 2017, ArXiv.

[8]  Jodi Forlizzi,et al.  “ Hey Alexa , What ’ s Up ? ” : Studies of In-Home Conversational Agent Usage , 2018 .

[9]  Thomas Engel,et al.  Website fingerprinting in onion routing based anonymization networks , 2011, WPES.

[10]  Hannes Federrath,et al.  Website fingerprinting: attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier , 2009, CCSW '09.

[11]  Riccardo Bettati,et al.  Analytical and empirical analysis of countermeasures to traffic analysis attacks , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[12]  Yuan Tian,et al.  Understanding and Mitigating the Security Risks of Voice-Controlled Third-Party Skills on Amazon Alexa and Google Home , 2018, ArXiv.

[13]  Mohsen Imani,et al.  Deep Fingerprinting: Undermining Website Fingerprinting Defenses with Deep Learning , 2018, CCS.

[14]  Mehmet Hadi Gunes,et al.  How to Find Hidden Users: A Survey of Attacks on Anonymity Networks , 2015, IEEE Communications Surveys & Tutorials.

[15]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[16]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[17]  Brian Neil Levine,et al.  Inferring the source of encrypted HTTP connections , 2006, CCS '06.

[18]  Rachel Greenstadt,et al.  How Unique is Your .onion?: An Analysis of the Fingerprintability of Tor Onion Services , 2017, CCS.

[19]  George Danezis,et al.  k-fingerprinting: A Robust Scalable Website Fingerprinting Technique , 2015, USENIX Security Symposium.

[20]  Brijesh Joshi,et al.  Touching from a distance: website fingerprinting attacks and defenses , 2012, CCS.

[21]  Deepak Kumar,et al.  Skill Squatting Attacks on Amazon Alexa , 2018, USENIX Security Symposium.