Recognition based segmentation of connected characters in text based CAPTCHAs

Text based CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is the most widely used mechanism adopted by numerous popular web sites in order to differentiate between machines and humans, however due to extensive research carried out by computer vision researchers, it is now a days vulnerable against automated attacks. Segmentation is the most difficult task in automatic recognition of CAPTCHAs, therefore contemporary Text based CAPTCHAs try to combine the characters together in order to make them as segmentation resistant against these attacks as possible. In this research, we have found vulnerabilities in such CAPTCHAs, a novel mechanism, i.e. the recognition based segmentation is applied to crop such connected characters, a sliding window based neural network classifier is used to recognize and segment the connected characters. Experimental results have proved 95.5% recognition success rate and 58.25% segmentation success rate on our dataset of tmall CAPTCHAs, this algorithm is further tested on two other datasets of slightly different implementations and promising results were achieved.

[1]  Anjali Avinash Chandavale,et al.  Security Analysis of CAPTCHA , 2012, SNDS.

[2]  Jeff Yan,et al.  A low-cost attack on a Microsoft captcha , 2008, CCS.

[3]  Mary Czerwinski,et al.  Computers beat Humans at Single Character Recognition in Reading based Human Interaction Proofs (HIPs) , 2005, CEAS.

[4]  Wen-Pinn Fang,et al.  A Study on Captcha Recognition , 2014, 2014 Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[5]  John Langford,et al.  CAPTCHA: Using Hard AI Problems for Security , 2003, EUROCRYPT.

[6]  Yeuan-Kuen Lee,et al.  An efficient segmentation algorithm for CAPTCHAs with line cluttering and character warping , 2010, Multimedia Tools and Applications.

[7]  Patrice Y. Simard,et al.  Using Machine Learning to Break Visual Human Interaction Proofs (HIPs) , 2004, NIPS.

[8]  Jeff Yan,et al.  Usability of CAPTCHAs or usability issues in CAPTCHA design , 2008, SOUPS '08.

[9]  Yeuan-Kuen Lee,et al.  A Projection-based Segmentation Algorithm for Breaking MSN and YAHOO CAPTCHAs , 2008 .

[10]  Oleg Starostenko,et al.  Breaking text-based CAPTCHAs with variable word and character orientation , 2015, Pattern Recognit..

[11]  Pongyupinpanich Surapong,et al.  Analysis of text-based CAPTCHA images using Template Matching Correlation technique , 2014, The 4th Joint International Conference on Information and Communication Technology, Electronic and Electrical Engineering (JICTEE).

[12]  Jitendra Malik,et al.  Recognizing objects in adversarial clutter: breaking a visual CAPTCHA , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Haiying Zhang,et al.  The Recognition of CAPTCHA Based on Fuzzy Matching , 2014 .