reCAPTCHA assisted OCR for Devanagiri Texts

CAPTCHA is a challenge response implemented using distorted characters on web to determine whether a user is a human or a computer. reCAPTCHA is constructive use of this human effort to digitize text from old documents which is difficult for OCR and also authenticate human user. The other paradigm is an integrated OCRreCAPTCHA system where OCR digitizes documents with high accuracy and comes up with a confidence score so that characters with low score are digitized using human response. This work is first attempt to build reCAPTCHA assisted OCR system for Devanagiri script that learns from human response on the web and digitizes document with high accuracy.

[1]  C. V. Jawahar,et al.  A bilingual OCR for Hindi-Telugu documents and its applications , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).