Hitting Three Birds with One System: A Voice-Based CAPTCHA for the Modern User

CAPTCHA challenges are used all over the Internet to prevent automated scripts from spamming web services. However, recent technological developments have rendered the conventional CAPTCHA insecure and inconvenient to use. In this paper, we propose vCAPTCHA, a voice-based CAPTCHA system that would: (1) enable more secure human authentication, (2) more conveniently integrate with modern devices accessing web services, and (3) help collect vast amounts of annotated speech data for different languages, accents, and dialects that are under-represented in the current speech corpora, thus making speech technologies accessible to more people around the world. vCAPTCHA requires users to speak their responses, in order to unlock or use different web services, instead of typing them. These user responses are analyzed to determine if they were indeed naturally produced, and transcribed to ensure that they contain the challenge sentence. We build a prototype for vCAPTCHA in order to assess its performance and practicality. Our preliminary results show that we are able to achieve an attack success rate as low as 2.3% while maintaining a human success rate comparable to current CAPTCHAs, on ASVspoof datasets.

[1]  Hermann Ney,et al.  Computing Mel-frequency cepstral coefficients on the power spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[2]  Dimitris Gritzalis,et al.  Audio CAPTCHA for SIP-Based VoIP , 2009, SEC.

[3]  Sajad Shirali-Shahreza,et al.  SeeSay and HearSay CAPTCHA for mobile interaction , 2013, CHI.

[4]  Khaled A. Harras,et al.  UbiBreathe: A Ubiquitous non-Invasive WiFi-based Breathing Estimator , 2015, MobiHoc.

[5]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[6]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[7]  Uwe Aickelin,et al.  An Audio CAPTCHA to Distinguish Humans from Computers , 2010, 2010 Third International Symposium on Electronic Commerce and Security.

[8]  Eduardo Lleida,et al.  Preventing replay attacks on speaker verification systems , 2011, 2011 Carnahan Conference on Security Technology.

[9]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[10]  Mohamed Ibrahim,et al.  Over-The-Air TV Detection Using Mobile Devices , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[11]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Chris Callison-Burch,et al.  Shared task: crowdsourced accessibility elicitation of Wikipedia articles , 2010, HLT-NAACL 2010.

[13]  Tomi Kinnunen,et al.  A comparison of features for synthetic speech detection , 2015, INTERSPEECH.

[14]  Aleksandr Sizov,et al.  ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge , 2015, INTERSPEECH.

[15]  Khaled A. Harras,et al.  The Hive: An Edge-based Middleware Solution for Resource Sharing in the Internet of Things , 2017, SmartObjects@MobiCom.

[16]  Sajad Shirali-Shahreza,et al.  Verifying Human Users in Speech-Based Interactions , 2011, INTERSPEECH.

[17]  Eliathamby Ambikairajah,et al.  Investigation of Spectral Centroid Magnitude and Frequency for Speaker Recognition , 2010, Odyssey.

[18]  Kong-Aik Lee,et al.  The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection , 2017, INTERSPEECH.

[19]  Khaled A. Harras,et al.  Disseminating Multilayer Multimedia Content Over Challenged Networks , 2018, IEEE Transactions on Multimedia.

[20]  Ibon Saratxaga,et al.  Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Dimitra Vergyri,et al.  Automatic speech recognition of multiple accented English data , 2010, INTERSPEECH.

[22]  Jean-Luc Gauvain,et al.  Speech recognition of multiple accented English data using acoustic model interpolation , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[23]  María José Cano,et al.  Experimental Analysis of Features for Replay Attack Detection - Results on the ASVspoof 2017 Challenge , 2017, INTERSPEECH.

[24]  Haizhou Li,et al.  Synthetic speech detection using temporal modulation feature , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Janne Lindqvist,et al.  Accessible Voice CAPTCHAs for Internet Telephony , 2008 .

[26]  Khaled A. Harras,et al.  MagBoard: Magnetic-Based Ubiquitous Homomorphic Off-the-Shelf Keyboard , 2016, 2016 13th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[27]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[28]  Galina Lavrentyeva,et al.  Audio Replay Attack Detection with Deep Learning Frameworks , 2017, INTERSPEECH.

[29]  John Langford,et al.  CAPTCHA: Using Hard AI Problems for Security , 2003, EUROCRYPT.

[30]  Nicholas W. D. Evans,et al.  A New Feature for Automatic Speaker Verification Anti-Spoofing: Constant Q Cepstral Coefficients , 2016, Odyssey.

[31]  Dimitris Gritzalis,et al.  The Sphinx enigma in critical VoIP infrastructures: Human or botnet? , 2013, IISA 2013.