Recognizing objects in adversarial clutter: breaking a visual CAPTCHA

In this paper we explore object recognition in clutter. We test our object recognition techniques on Gimpy and EZ-Gimpy, examples of visual CAPTCHAs. A CAPTCHA ("Completely Automated Public Turing test to Tell Computers and Humans Apart") is a program that can generate and grade tests that most humans can pass, yet current computer programs can't pass. EZ-Gimpy, currently used by Yahoo, and Gimpy are CAPTCHAs based on word recognition in the presence of clutter. These CAPTCHAs provide excellent test sets since the clutter they contain is adversarial; it is designed to confuse computer programs. We have developed efficient methods based on shape context matching that can identify the word in an EZ-Gimpy image with a success rate of 92%, and the requisite 3 words in a Gimpy image 33% of the time. The problem of identifying words in such severe clutter provides valuable insight into the more general problem of object recognition in scenes. The methods that we present are instances of a framework designed to tackle this general problem.

[1]  Yehezkel Lamdan,et al.  Affine invariant model-based object recognition , 1990, IEEE Trans. Robotics Autom..

[2]  Yali Amit,et al.  Graphical Templates for Model Registration , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[4]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[5]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[6]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[8]  Richard Szeliski,et al.  Layer extraction from multiple images containing reflections and transparency , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[9]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Jitendra Malik,et al.  Shape contexts enable efficient retrieval of similar shapes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[11]  Donald Geman,et al.  Model-based classification trees , 2001, IEEE Trans. Inf. Theory.

[12]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[13]  Venu Govindaraju,et al.  The Role of Holistic Paradigms in Handwritten Word Recognition , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Benoît Maison,et al.  Toward island-of-reliability-driven very-large-vocabulary on-line handwriting recognition using character confidence scoring , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[15]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[16]  David J. Fleet,et al.  A Layered Motion Representation with Occlusion and Compact Spatial Support , 2002, ECCV.

[17]  Henry S. Baird,et al.  BaffleText: a Human Interactive Proof , 2003, IS&T/SPIE Electronic Imaging.

[18]  John Langford,et al.  Telling humans and computers apart automatically , 2004, CACM.

[19]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.