WearID: Wearable-Assisted Low-Effort Authentication to Voice Assistants using Cross-Domain Speech Similarity

Due to the open nature of voice input, voice assistant (VA) systems (e.g., Google Home and Amazon Alexa) are under a high risk of sensitive information leakage (e.g., personal schedules and shopping accounts). Though the existing VA systems may employ voice features to identify users, they are still vulnerable to various acoustic attacks (e.g., impersonation, replay and hidden command attacks). In this work, we focus on the security issues of the emerging VA systems and aim to protect the users' highly sensitive information from these attacks. Towards this end, we propose a system, WearID, which uses an off-the-shelf wearable device (e.g., a smartwatch or bracelet) as a secure token to verify the user's voice commands to the VA system. In particular, WearID exploits the readily available motion sensors from most wearables to describe the command sound in vibration domain and check the received command sound across two domains (i.e., wearable's motion sensor vs. VA device's microphone) to ensure the sound is from the legitimate user.

[1]  Haizhou Li,et al.  Synthetic speech detection using temporal modulation feature , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Aziz Mohaisen,et al.  You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[3]  Murtuza Jadliwala,et al.  Information Leakage through Mobile Motion Sensors: User Awareness and Concerns , 2017 .

[4]  Micah Sherr,et al.  Hidden Voice Commands , 2016, USENIX Security Symposium.

[5]  Jie Yang,et al.  Hearing Your Voice is Not Enough: An Articulatory Gesture Based Liveness Detection for Voice Authentication , 2017, CCS.

[6]  Parth H. Pathak,et al.  AccelWord: Energy Efficient Hotword Detection through Accelerometer , 2015, MobiSys.

[7]  Les E. Atlas,et al.  EURASIP Journal on Applied Signal Processing 2003:7, 668–675 c ○ 2003 Hindawi Publishing Corporation Joint Acoustic and Modulation Frequency , 2003 .

[8]  Jie Yang,et al.  VoiceLive: A Phoneme Localization based Liveness Detection for Voice Authentication on Smartphones , 2016, CCS.

[9]  Mats Blomberg,et al.  Vulnerability in speaker verification - a study of technical impostor techniques , 1999, EUROSPEECH.

[10]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[11]  K. J. Gabriel Microelectromechanical systems (MEMS) , 1997, 1997 IEEE Aerospace Conference.

[12]  Junichi Yamagishi,et al.  Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech , 2010, Odyssey.

[13]  Wenyuan Xu,et al.  DolphinAttack: Inaudible Voice Commands , 2017, CCS.

[14]  Lynn Fuller Microelectromechanical Systems (MEMs) Applications - Microphones , 2015 .

[15]  Nitesh Saxena,et al.  Speechless: Analyzing the Threat to Speech Privacy from Smartphone Motion Sensors , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[16]  Tomi Kinnunen,et al.  I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry , 2013, INTERSPEECH.

[17]  Ibon Saratxaga,et al.  Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Erik McDermott,et al.  Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Najiya Abdulrahiman,et al.  Text-dependent speaker recognition , 2018, Odyssey.

[20]  Tomi Kinnunen,et al.  Speaker Verification with Adaptive Spectral Subband Centroids , 2007, ICB.

[21]  Patrick Traynor,et al.  2MA: Verifying Voice Commands via Two Microphone Authentication , 2018, AsiaCCS.

[22]  Shweta Bansal,et al.  Proceedings of Meetings on Acoustics , 2013 .

[23]  Kang G. Shin,et al.  Continuous Authentication for Voice Assistants , 2017, MobiCom.

[24]  Wenyuan Xu,et al.  WindCompass: Determine Wind Direction Using Smartphones , 2016, 2016 13th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[25]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[26]  R Togneri,et al.  An Overview of Speaker Identification: Accuracy and Robustness Issues , 2011, IEEE Circuits and Systems Magazine.

[27]  Bayya Yegnanarayana,et al.  Combining evidence from residual phase and MFCC features for speaker recognition , 2006, IEEE Signal Processing Letters.

[28]  Gabi Nakibly,et al.  Gyrophone: Recognizing Speech from Gyroscope Signals , 2014, USENIX Security Symposium.