A Continuous Liveness Detection System for Text-independent Speaker Verification

Voice authentication is drawing increasing attention and becomes an attractive alternative to passwords for mobile authentication. Recent advances in mobile technology further accelerate the adoption of voice biometrics in an array of diverse mobile applications. However, recent studies show that voice authentication is vulnerable to replay attacks, where an adversary can spoof a voice authentication system using a pre-recorded voice sample collected from the victim. In this paper, we propose VoiceLive, a liveness detection system for both text-dependent and text-independent voice authentication on smartphones. VoiceLive detects a live user by leveraging the user’s unique vocal system and the stereo recording of smartphones. In particular, utilizing the built-in gyroscope, loudspeaker and microphone, VoiceLive first measures the smartphone’s distance and angle from the user, then it captures the position specific time-difference-of-arrival (TDoA) changes in a sequence of phoneme sounds to the two microphones of the phone, and uses such unique TDoA dynamic which doesn’t exist under replay attacks for liveness detection. VoiceLive is practical as it doesn’t require additional hardware but two-channel stereo recording that is supported by virtually all smartphones. Our experimental evaluation with 12 participants and different types of phones shows that VoiceLive achieves over 99% detection accuracy at around 1% Equal Error Rate (EER) on the text-dependent system and around 99% accuracy and 2% EER on the text-independent one. Results also show that VoiceLive is robust to different phone positions, i.e. the user are free to hold the smartphone with distinct distances and

[1]  Jie Wu,et al.  Defending Against Voice Spoofing: A Robust Software-Based Liveness Detection System , 2018, 2018 IEEE 15th International Conference on Mobile Ad Hoc and Sensor Systems (MASS).

[2]  Eduardo Lleida,et al.  Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems , 2011, BIOID.

[3]  Haizhou Li,et al.  A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[4]  Michael Wagner,et al.  "liveness" Verification in Audio-video Authentication , 2004, INTERSPEECH.

[5]  Chng Eng Siong,et al.  Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  John Coleman,et al.  Acoustics of American English speech : a dynamic approach , 1993 .

[7]  Florian Schiel,et al.  Signal processing via web services: The use case WebMAUS , 2012 .

[8]  James Sneed German,et al.  Detecting voice disguise from speech variability: Analysis of three glottal and vocal tract measures , 2013 .

[9]  Jie Yang,et al.  Hearing Your Voice is Not Enough: An Articulatory Gesture Based Liveness Detection for Voice Authentication , 2017, CCS.

[10]  Kang G. Shin,et al.  Continuous Authentication for Voice Assistants , 2017, MobiCom.

[11]  Xiangyang Luo,et al.  VoicePop: A Pop Noise based Anti-spoofing System for Voice Authentication on Smartphones , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[12]  Mats Blomberg,et al.  Vulnerability in speaker verification - a study of technical impostor techniques , 1999, EUROSPEECH.

[13]  Jie Yang,et al.  E-eyes: device-free location-oriented activity identification using fine-grained WiFi signatures , 2014, MobiCom.

[14]  Sharath Pankanti,et al.  Biometrics: Personal Identification in Networked Society , 2013 .

[15]  Aziz Mohaisen,et al.  You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[16]  Gang Wei,et al.  Channel pattern noise based playback attack detection algorithm for speaker recognition , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[17]  Mosur Ravishankar,et al.  Efficient Algorithms for Speech Recognition. , 1996 .

[18]  P. Ladefoged A course in phonetics , 1975 .

[19]  Edward T. Hall,et al.  Handbook for Proxemic Research , 1974 .

[20]  Chen Wang,et al.  Critical segment based real-time E-signature for securing mobile transactions , 2015, 2015 IEEE Conference on Communications and Network Security (CNS).

[21]  Li-Rong Dai,et al.  Speaker verification against synthetic speech , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[22]  Florian Schiel,et al.  Automatic detection and segmentation of pronunciation variants in German speech corpora , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[23]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[24]  Jie Yang,et al.  Detecting Spoofing Attacks in Mobile Wireless Environments , 2009, 2009 6th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks.

[25]  Wei Shang,et al.  Score normalization in playback attack detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Ben Maassen,et al.  The Handbook of Speech Production , 2015 .

[27]  Young Chul Park The Protection of Biometric Information , 2004 .

[28]  Jie Yang,et al.  VoiceLive: A Phoneme Localization based Liveness Detection for Voice Authentication on Smartphones , 2016, CCS.

[29]  Jie Yang,et al.  Snooping Keystrokes with mm-level Audio Ranging on a Single Phone , 2015, MobiCom.

[30]  Tomi Kinnunen,et al.  I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry , 2013, INTERSPEECH.

[31]  Ibon Saratxaga,et al.  Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Wenyuan Xu,et al.  The Catcher in the Field: A Fieldprint based Spoofing Detection for Text-Independent Speaker Verification , 2019, CCS.

[33]  Wei Zhang,et al.  WiVo: Enhancing the Security of Voice Control System via Wireless Signal in IoT Environment , 2018, MobiHoc.