Leveraging the Mixed-Text Segmentation Problem to Design Secure Handwritten CAPTCHAs

In this paper we present a novel CAPTCHA that is based on the current hard AI problem of mixed-text (handwriting and printed-text) segmentation. The proposed CAPTCHA overlays generated handwritten word images on a generated printed-text background. We first propose a modification that allows for character level perturbations on an existing synthetic handwriting generation technique. These perturbations are parameterized allowing for varying levels of handwritten word complexity. We then use the output from the modified synthetic handwriting generator as the foreground for the mixed-text CAPTCHA. Experiments show that the proposed approach is effective at successfully distinguishing between humans and machines. Human recognition accuracy averages at 0.77 while machine accuracy is below 0.0001.

[1]  Henry S. Baird,et al.  BaffleText: a Human Interactive Proof , 2003, IS&T/SPIE Electronic Imaging.

[2]  Venu Govindaraju,et al.  Synthetic handwritten CAPTCHAs , 2009, Pattern Recognit..

[3]  Jinhong Katherine Guo,et al.  Separating handwritten material from machine printed text using hidden Markov models , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[4]  Jeff Yan,et al.  Breaking Visual CAPTCHAs with Naive Pattern Recognition Algorithms , 2007, Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007).

[5]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[6]  B. Vandame REAL-TIME ANISOTROPIC FILTERING BASED ON LINE INTEGRAL CONVOLUTION APPROXIMATIONS , 2008 .

[7]  Adam Krzyzak,et al.  Piecewise Linear Skeletonization Using Principal Curves , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Venu Govindaraju,et al.  Markov Random Field Based Text Identification from Annotated Machine Printed Documents , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[9]  Venu Govindaraju,et al.  Identifying Handwritten Text in Mixed Documents , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[10]  Henry S. Baird,et al.  Pessimal print: a reverse Turing test , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[11]  John Langford,et al.  Telling humans and computers apart automatically , 2004, CACM.

[12]  Daniel Cohen-Or,et al.  Emerging images , 2009, SIGGRAPH 2009.

[13]  Henry S. Baird,et al.  A Highly Legible CAPTCHA That Resists Segmentation Attacks , 2005, HIP.

[14]  Venu Govindaraju,et al.  A human interactive proof algorithm using handwriting recognition , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[15]  Mary Czerwinski,et al.  Building Segmentation Based Human-Friendly Human Interaction Proofs (HIPs) , 2005, HIP.

[16]  Venu Govindaraju,et al.  A Stochastic Model Combining Discrete Symbols and Continuous Attributes and Its Application to Handwriting Recognition , 2002, Document Analysis Systems.

[17]  David S. Doermann,et al.  Machine printed text and handwriting identification in noisy document images , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.