VibLive: A Continuous Liveness Detection for Secure Voice User Interface in IoT Environment

The voice user interface (VUI) has been progressively used to authenticate users to numerous devices and applications. Such massive adoption of VUIs in IoT environments like individual homes and businesses arises extensive privacy and security concerns. Latest VUIs adopting traditional voice authentication methods are vulnerable to spoofing attacks, where a malicious party spoofs the VUIs with pre-recorded or synthesized voice commands of the genuine user. In this paper, we design VibLive, a continuous liveness detection system for secure VUIs in IoT environments. The underlying principle of VibLive is to catch the dissimilarities between bone-conducted vibrations and air-conducted voices when human speaks for liveness detection. VibLive is a text-independent system that verifies live users and detects spoofing attacks without requiring users to enroll specific passphrases. Moreover, VibLive is practical and transparent as it requires neither additional operations nor extra hardwares, other than a loudspeaker and a microphone that are commonly equipped on VUIs. Our evaluation with 25 participants under different IoT intended experiment settings shows that VibLive is highly effective with over 97% detection accuracy. Results also show that VibLive is robust to various use scenarios.

[1]  Eric J. Hunter and Daniel Ludwigsen Source Filter Theory , 2016 .

[2]  Kang G. Shin,et al.  Continuous Authentication for Voice Assistants , 2017, MobiCom.

[3]  Alex X. Liu,et al.  The Insecurity of Home Digital Voice Assistants - Amazon Alexa as a Case Study , 2017, ArXiv.

[4]  Jacob B Munger,et al.  Frequency response of the skin on the head and neck during production of selected speech sounds. , 2008, The Journal of the Acoustical Society of America.

[5]  A. Shahina,et al.  Combining spectral features of standard and Throat Microphones for speaker identification , 2012, 2012 International Conference on Recent Trends in Information Technology.

[6]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[7]  Tomasz Letowski,et al.  Bone Conduction: Anatomy, Physiology, and Communication , 2007 .

[8]  P. P. Vaidyanathan,et al.  The Theory of Linear Prediction , 2008, Synthesis Lectures on Signal Processing.

[9]  Bayya Yegnanarayana,et al.  Throat microphone signal for speaker recognition , 2004, INTERSPEECH.

[10]  Chng Eng Siong,et al.  Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Wei Zhang,et al.  Securing Consumer IoT in the Smart Home: Architecture, Challenges, and Countermeasures , 2018, IEEE Wireless Communications.

[12]  William D O'Briend,et al.  Evaluation of Acoustic Propagation Paths into the Human Head , 2005 .

[13]  Aziz Mohaisen,et al.  You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[14]  Georg v. Békésy,et al.  Zur Theorie des Hörens bei der Schallaufnahme durch Knochenleitung , 1932 .

[15]  B. Sudhakar,et al.  Automatic speech segmentation to improve speech synthesis performance , 2013, 2013 International Conference on Circuits, Power and Computing Technologies (ICCPCT).

[16]  Xiangyang Luo,et al.  VoicePop: A Pop Noise based Anti-spoofing System for Voice Authentication on Smartphones , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[17]  Gang Wei,et al.  Channel pattern noise based playback attack detection algorithm for speaker recognition , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[18]  Christian Poellabauer,et al.  Protecting Voice Controlled Systems Using Sound Source Identification Based on Acoustic Cues , 2018, 2018 27th International Conference on Computer Communication and Networks (ICCCN).

[19]  Micah Sherr,et al.  Hidden Voice Commands , 2016, USENIX Security Symposium.

[20]  Eduardo Lleida,et al.  Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems , 2011, BIOID.

[21]  Jie Yang,et al.  VoiceLive: A Phoneme Localization based Liveness Detection for Voice Authentication on Smartphones , 2016, CCS.

[22]  Nicholas W. D. Evans,et al.  Re-assessing the threat of replay spoofing attacks against automatic speaker verification , 2014, 2014 International Conference of the Biometrics Special Interest Group (BIOSIG).

[23]  Wei Zhang,et al.  WiVo: Enhancing the Security of Voice Control System via Wireless Signal in IoT Environment , 2018, MobiHoc.

[24]  Patrick Traynor,et al.  2MA: Verifying Voice Commands via Two Microphone Authentication , 2018, AsiaCCS.

[25]  J Tonndorf,et al.  Mechanical parameters of hearing by bone conduction. , 1976, The Journal of the Acoustical Society of America.

[26]  Nitesh Saxena,et al.  All Your Voices are Belong to Us: Stealing Voices to Fool Humans and Machines , 2015, ESORICS.

[27]  Tomi Kinnunen,et al.  I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry , 2013, INTERSPEECH.

[28]  Ibon Saratxaga,et al.  Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Bing Zhou,et al.  EchoPrint: Two-factor Authentication using Acoustics and Vision on Smartphones , 2018, MobiCom.

[30]  Romit Roy Choudhury,et al.  BackDoor: Making Microphones Hear Inaudible Sounds , 2017, MobiSys.

[31]  Wenyuan Xu,et al.  DolphinAttack: Inaudible Voice Commands , 2017, CCS.

[32]  J. Makhoul Spectral analysis of speech by linear prediction , 1973 .

[33]  Wenyuan Xu,et al.  The Catcher in the Field: A Fieldprint based Spoofing Detection for Text-Independent Speaker Verification , 2019, CCS.

[34]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[35]  Jie Yang,et al.  Hearing Your Voice is Not Enough: An Articulatory Gesture Based Liveness Detection for Voice Authentication , 2017, CCS.

[36]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[37]  Hafiz Malik,et al.  Towards Vulnerability Analysis of Voice-Driven Interfaces and Countermeasures for Replay Attacks , 2019, 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).