Development of an Assamese OCR using Bangla OCR

This paper refers to the development of an OCR for the Assamese language by modifying an existing OCR for the Bangla language. This modification is feasible because the Assamese script is similar, except for a few characters, to the Bangla script. The OCR incorporates a two stage recognizer using SVM classifier with no post-processing. A spell-checker capable of detecting most errors and interactively recommending some corrections is implemented. The OCR is tested with about 1800 pages of good quality printed documents. The accuracy achieved is about 97%.

[1]  Bidyut Baran Chaudhuri,et al.  Indian script character recognition: a survey , 2004, Pattern Recognit..

[2]  George Nagy,et al.  Self-correcting 100-font classifier , 1994, Electronic Imaging.

[3]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[4]  B. GATOS,et al.  Skew detection and text line position determination in digitized documents , 1997, Pattern Recognit..

[5]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[6]  Veena Bansal,et al.  A complete OCR for printed Hindi text in Devanagari script , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Nikos Fakotakis,et al.  Skew angle estimation for printed and handwritten documents using the Wigner-Ville distribution , 2002, Image Vis. Comput..

[9]  P. S. Sastry,et al.  A font and size-independent OCR system for printed Kannada documents using support vector machines , 2002 .

[10]  Venu Govindaraju,et al.  Guide to OCR for Indic Scripts: Document Recognition and Retrieval , 2009 .

[11]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[12]  Adnan Amin,et al.  A Document Skew Detection Method Using the Hough Transform , 2000, Pattern Analysis & Applications.

[13]  Fumitaka Kimura,et al.  Handwritten numerical recognition based on multiple algorithms , 1991, Pattern Recognit..

[14]  Victoria J. Hodge,et al.  A Comparison of Standard Spell Checking Algorithms and a Novel Binary Neural Approach , 2003, IEEE Trans. Knowl. Data Eng..

[15]  Prabin Kumar Bora,et al.  A Comparative Study on Discrete Orthonormal Chebyshev Moments and Legendre Moments for Representation of Printed Characters , 2004, ICVGIP.

[16]  S.C. Hinds,et al.  A document skew detection method using run-length encoding and the Hough transform , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[17]  AustinJim,et al.  A Comparison of Standard Spell Checking Algorithms and a Novel Binary Neural Approach , 2003 .