Hearing Your Voice is Not Enough: An Articulatory Gesture Based Liveness Detection for Voice Authentication

Voice biometrics is drawing increasing attention as it is a promising alternative to legacy passwords for mobile authentication. Recently, a growing body of work shows that voice biometrics is vulnerable to spoofing through replay attacks, where an adversary tries to spoof voice authentication systems by using a pre-recorded voice sample collected from a genuine user. In this work, we propose VoiceGesture, a liveness detection system for replay attack detection on smartphones. It detects a live user by leveraging both the unique articulatory gesture of the user when speaking a passphrase and the mobile audio hardware advances. Specifically, our system re-uses the smartphone as a Doppler radar, which transmits a high frequency acoustic sound from the built-in speaker and listens to the reflections at the microphone when a user speaks a passphrase. The signal reflections due to user's articulatory gesture result in Doppler shifts, which are then analyzed for live user detection. VoiceGesture is practical as it requires neither cumbersome operations nor additional hardware but a speaker and a microphone that are commonly available on smartphones. Our experimental evaluation with 21 participants and different types of phones shows that it achieves over 99% detection accuracy at around 1% Equal Error Rate (EER). Results also show that it is robust to different phone placements and is able to work with different sampling frequencies.

[1]  Hagai Aronowitz,et al.  Voice transformation-based spoofing of text-dependent speaker verification systems , 2013, INTERSPEECH.

[2]  Wei Shi,et al.  An Efficient Learning Based Smartphone Playback Attack Detection Using GMM Supervector , 2016, 2016 IEEE Second International Conference on Multimedia Big Data (BigMM).

[3]  Shweta Bansal,et al.  Proceedings of Meetings on Acoustics , 2013 .

[4]  Aleksandr Sizov,et al.  ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge , 2015, INTERSPEECH.

[5]  Jie Yang,et al.  Detecting Spoofing Attacks in Mobile Wireless Environments , 2009, 2009 6th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks.

[6]  Haizhou Li,et al.  On the study of replay and voice conversion attacks to text-dependent speaker verification , 2016, Multimedia Tools and Applications.

[7]  David Zhang Biometric solutions : for authentication in an E-world , 2002 .

[8]  Paul Tseng,et al.  Robust wavelet denoising , 2001, IEEE Trans. Signal Process..

[9]  Haizhou Li,et al.  A study on replay attack and anti-spoofing for text-dependent speaker verification , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[10]  B. Kröger,et al.  A gesture‐based dynamic model describing articulatory movement data , 1995 .

[11]  Achintya Prakash,et al.  Crowdsourcing Attacks on Biometric Systems , 2014, SOUPS.

[12]  Maciej Smiatacz Playback Attack Detection: The Search for the Ultimate Set of Antispoof Features , 2017, CORES.

[13]  Jie Yang,et al.  VoiceLive: A Phoneme Localization based Liveness Detection for Voice Authentication on Smartphones , 2016, CCS.

[14]  Tomi Kinnunen,et al.  Automatic versus human speaker verification: The case of voice mimicry , 2015, Speech Commun..

[15]  P. Ladefoged A course in phonetics , 1975 .

[16]  Monika Eisenhower,et al.  Encyclopedia Of Physical Science And Technology , 2016 .

[17]  Gang Wei,et al.  Channel pattern noise based playback attack detection algorithm for speaker recognition , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[18]  Junichi Yamagishi,et al.  Voice Liveness Detection for Speaker Verification based on a Tandem Single/Double-channel Pop Noise Detector , 2016, Odyssey.

[19]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[20]  Nicholas W. D. Evans,et al.  Re-assessing the threat of replay spoofing attacks against automatic speaker verification , 2014, 2014 International Conference of the Biometrics Special Interest Group (BIOSIG).

[21]  A. Simpson,et al.  Dynamic consequences of differences in male and female vocal tract dimensions. , 2001, The Journal of the Acoustical Society of America.

[22]  Chng Eng Siong,et al.  Vulnerability of speaker verification systems against voice conversion spoofing attacks: The case of telephone speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Jie Yang,et al.  Snooping Keystrokes with mm-level Audio Ranging on a Single Phone , 2015, MobiCom.

[24]  James M Scobbie,et al.  Dynamic Dialects: an articulatory web resource for the study of accents , 2015 .

[25]  Aziz Mohaisen,et al.  You Can Hear But You Cannot Steal: Defending Against Voice Impersonation Attacks on Smartphones , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[26]  Florian Schiel,et al.  Automatic detection and segmentation of pronunciation variants in German speech corpora , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[27]  Artur Janicki,et al.  An assessment of automatic speaker verification vulnerabilities to replay spoofing attacks , 2016, Secur. Commun. Networks.

[28]  Shrikanth S. Narayanan,et al.  Statistical methods for estimation of direct and differential kinematics of the vocal tract , 2013, Speech Commun..

[29]  Aleksandr Sizov,et al.  Joint Speaker Verification and Antispoofing in the $i$ -Vector Space , 2015, IEEE Transactions on Information Forensics and Security.

[30]  James R. Williams,et al.  Guidelines for the Use of Multimedia in Instruction , 1998 .

[31]  Haizhou Li,et al.  Spoofing and countermeasures for speaker verification: A survey , 2015, Speech Commun..

[32]  V L Gracco,et al.  Articulatory organization of mandibular, labial, and velar movements during speech. , 1995, The Journal of the Acoustical Society of America.

[33]  Chng Eng Siong,et al.  Zero resource anti-spoofing detection for unit selection based synthetic speech using image spectrogram artifacts , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[34]  Nicholas W. D. Evans,et al.  A one-class classification approach to generalised speaker verification spoofing countermeasures using local binary patterns , 2013, 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[35]  Mosur Ravishankar,et al.  Efficient Algorithms for Speech Recognition. , 1996 .

[36]  P Ladefoged,et al.  Individual differences in vowel production. , 1993, The Journal of the Acoustical Society of America.

[37]  Florian Schiel,et al.  Signal processing via web services: The use case WebMAUS , 2012 .

[38]  John Coleman,et al.  Acoustics of American English speech : a dynamic approach , 1993 .

[39]  Haizhou Li,et al.  Front-End for Antispoofing Countermeasures in Speaker Verification: Scattering Spectral Decomposition , 2017, IEEE Journal of Selected Topics in Signal Processing.

[40]  Tomi Kinnunen,et al.  I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry , 2013, INTERSPEECH.

[41]  Ibon Saratxaga,et al.  Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Haizhou Li,et al.  A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.