Breaking text-based CAPTCHAs with variable word and character orientation

A novel approach for automatic segmentation and recognition of CAPTCHAs with variable orientation and random collapse of overlapped characters is presented in this paper. Additionally, the extension of the proposed approach to break reCAPTCHA of version of 2012 is also discussed. The original proposal consists in straightening characters and word in CAPTCHA exploiting then a three-color bar code for their segmentation. The recognition of straightened characters and whole word is provided by the proposed original SVM-based learning classifier. The main goal of this research is to reduce vulnerability of CAPTCHA from spam and frauds as well as to provide an approach for recognizing either handwritten or degraded and damaged texts in ancient manuscripts by OCR systems. The designed framework for breaking CAPTCHAs by the proposed approach has been tested achieving average segmentation success rate up to 82% for reCAPTCHA of version 2011 and achieving 95.5% by extended approach for reCAPTCHA of version 2012 with response time less than 0.5s per two-word reCAPTCHA. The implemented SVM classifier shows a competitive precision about 94%. The obtained very satisfactory results confirm that the proposed approach may be used for development of new security mechanisms to protect users against cyber-criminal activities and Internet threats. Automatic segmentation and recognition of CAPTCHAs in Web sites is proposed.Anti-recognition techniques use collapsed characters with variable orientation.Aligned word and straightened characters are segmented by three-color bar code.Original SVM-based learning classifier provides real-time CAPTCHA recognition.Extended approach for beating reCAPTCHA of version 2012 shows better performance.

[1]  John C. Mitchell,et al.  Easy does it: more usable CAPTCHAs , 2014, CHI.

[2]  Oleg Starostenko,et al.  Breaking reCAPTCHAs with Unpredictable Collapse: Heuristic Character Segmentation and Recognition , 2012, MCPR.

[3]  Charlie Obimbo,et al.  CaptchAll: An Improvement on the Modern Text-based CAPTCHA , 2013, Complex Adaptive Systems.

[4]  Yusuke Watanabe,et al.  Recognition of One-stroke Symbols by Humans and Computers , 2013 .

[5]  Adarsh Baluni,et al.  Two-Step CAPTCHA: Using a Simple Two Step Turing Test to Differentiate between Humans and Bots , 2013 .

[6]  Xi Zhang,et al.  Image Based Retrieval and Keyword Spotting in Documents , 2014, Handbook of Document Image Processing and Recognition.

[7]  Silky Azad,et al.  CAPTCHA: Attacks and Weaknesses against OCR technology , 2013 .

[8]  Jeff Yan,et al.  The Robustness of Google CAPTCHAs , 2011 .

[9]  Graeme Bell,et al.  Strengthening CAPTCHA-based Web security , 2012, First Monday.

[10]  Richa Singh,et al.  FaceDCAPTCHA: Face detection based color image CAPTCHA , 2014, Future Gener. Comput. Syst..

[11]  Shujun Li,et al.  Breaking e-banking CAPTCHAs , 2010, ACSAC '10.

[12]  Jeff Yan,et al.  The robustness of a new CAPTCHA , 2010, EUROSEC '10.

[13]  Wei Wang,et al.  The Robustness of "Connecting Characters Together" CAPTCHAs , 2014, J. Inf. Sci. Eng..

[14]  Shanu Salunke,et al.  Cracking Captchas For Cash: A Review Of CAPTCHA Crackers , 2013 .

[15]  Peipeng Liu,et al.  An Efficient Ellipse-Shaped Blobs Detection Algorithm for Breaking Facebook CAPTCHA , 2012, ISCTCS.

[16]  John C. Mitchell,et al.  Text-based CAPTCHA strengths and weaknesses , 2011, CCS '11.

[17]  Kun Fang,et al.  Segmentation of CAPTCHAs Based on Complex Networks , 2012, AICI.

[18]  H. S. Fadewar,et al.  CAPTCHA Based Web Security: An Overview , 2013 .

[19]  Amit Sethi,et al.  An ingenious technique for symbol identification from high noise CAPTCHA images , 2012, 2012 Annual IEEE India Conference (INDICON).