Odia Running Text Recognition Using Moment-Based Feature Extraction and Mean Distance Classification Technique

Optical character recognition (OCR) is a process of automatic recognition of character from optically scanned documents for the purpose of editing, indexing, searching, as well as reduction in storage space. Development of OCR for an Indian script is an active area of research today because the presence of a large number of letters in the alphabet set, their sophisticated combinations, and the complicated grapheme’s they formed is a great challenge to an OCR designer. We are trying to develop the OCR system for Odia language, which is used as official language of Odisha (formerly known as Orissa). In this paper, we attempt to recognize the vowels, consonants, matras, and compound characters of running Odia script. At first, the given scanned text is segmented into individual Odia symbols, then, extract corresponding feature vectors, using two-dimensional moments and Hough transform (based on topological and geometrical properties), which are used to classify and recognize the symbol. We found that the proposed model can recognize up to 100 % running test having no touched characters.

[1]  Bidyut Baran Chaudhuri,et al.  Indian script character recognition: a survey , 2004, Pattern Recognit..

[2]  Tetsushi Wakabayashi,et al.  A System for Off-Line Oriya Handwritten Character Recognition Using Curvature Feature , 2007 .

[3]  Umapada Pal,et al.  Offline Recognition of Devanagari Script: A Survey , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Manas Ranjan Patra,et al.  Rule Based Evidence Mining for Network Attack , 2007 .

[5]  Mandar Mitra,et al.  Automatic recognition of printed Oriya script , 2002 .

[6]  Ajit Kumar Nayak,et al.  Odia Characters Recognition by Training Tesseract OCR Engine , 2013 .

[7]  Vijay H. Mankar,et al.  A Review of Research on Devnagari Character Recognition , 2010, ArXiv.

[8]  Sukadev Meher,et al.  An intelligent scanner with handwritten odia character recognition capability , 2011, 2011 Fifth International Conference on Sensing Technology.

[9]  C. V. Jawahar,et al.  Tools for Developing OCRs for Indian Scripts , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.