Solving Google's Continuous Audio CAPTCHA with HMM-Based Automatic Speech Recognition

CAPTCHAs play critical roles in maintaining the security of various Web services by distinguishing humans from automated programs and preventing Web services from being abused. CAPTCHAs are designed to block automated programs by presenting questions that are easy for humans but difficult for computers, e.g., recognition of visual digits or audio utterances. Recent audio CAPTCHAs, such as Google’s audio reCAPTCHA, have presented overlapping and distorted target voices with stationary background noise. We investigate the security of overlapping audio CAPTCHAs by developing an audio reCAPTCHA solver. Our solver is constructed based on speech recognition techniques using hidden Markov models (HMMs). It is implemented by using an off-the-shelf library HMM Toolkit. Our experiments revealed vulnerabilities in the current version of audio reCAPTCHA with the solver cracking 52% of the questions. We further explain that background stationary noise did not contribute to enhance security against our solver.

[1]  John C. Mitchell,et al.  How Good Are Humans at Solving CAPTCHAs? A Large Scale Evaluation , 2010, 2010 IEEE Symposium on Security and Privacy.

[2]  Jeff Yan,et al.  A low-cost attack on a Microsoft captcha , 2008, CCS.

[3]  Hsiao-Wuen Hon,et al.  Large-vocabulary speaker-independent continuous speech recognition using HMM , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[4]  Steve Young,et al.  The HTK hidden Markov model toolkit: design and philosophy , 1993 .

[5]  John C. Mitchell,et al.  The Failure of Noise-Based Non-continuous Audio Captchas , 2011, 2011 IEEE Symposium on Security and Privacy.

[6]  Mary Czerwinski,et al.  Building Segmentation Based Human-Friendly Human Interaction Proofs (HIPs) , 2005, HIP.

[7]  Mary Czerwinski,et al.  Computers beat Humans at Single Character Recognition in Reading based Human Interaction Proofs (HIPs) , 2005, CEAS.

[8]  Luis von Ahn,et al.  Breaking Audio CAPTCHAs , 2008, NIPS.

[9]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[10]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[11]  W. Kunin,et al.  Extinction risk and the 1/f family of noise models. , 1999, Theoretical population biology.

[12]  S. Umesh,et al.  Frequency warping and the Mel scale , 2002, IEEE Signal Processing Letters.

[13]  John Langford,et al.  Telling humans and computers apart automatically , 2004, CACM.

[14]  K. Maekawa CORPUS OF SPONTANEOUS JAPANESE : ITS DESIGN AND EVALUATION , 2003 .

[15]  Jitendra Malik,et al.  Recognizing objects in adversarial clutter: breaking a visual CAPTCHA , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  Venu Govindaraju,et al.  Human Interactive Proofs , 2008 .