Automatic recognition of printed Oriya script

The paper deals with an optical character recognition system for printed Oriya, a popular Indian script. The development of OCR for this script is difficult because a large number of characters have to be recognized. In the proposed system, the digitized document image is first passed through preprocessing modules like skew correction, line segmentation, zone detection, word and character segmentation, etc. These modules have been developed by combining some conventional techniques with some newly proposed ones. Next, individual characters are recognized using a combination of stroke and run-number based features, along with features obtained from the concept of a water reservoir. The feature detection methods are simple and robust. A prototype of the system has been tested on a variety of printed Oriya material, and currently achieves 96.3% character level accuracy on average.

[1]  Harry Wechsler,et al.  Automated page orientation and skew angle detection for binary document images , 1994, Pattern Recognit..

[2]  J. Mantas,et al.  An overview of character recognition methodologies , 1986, Pattern Recognit..

[3]  V. K. Govindan,et al.  Character recognition - A review , 1990, Pattern Recognit..

[4]  Jiangying Zhou,et al.  Page segmentation and classification , 1992, CVGIP Graph. Model. Image Process..

[5]  Santanu Chaudhury,et al.  Bengali alpha-numeric character recognition using curvature features , 1993, Pattern Recognit..

[6]  Sargur N. Srihari,et al.  Off-Line Cursive Script Word Recognition , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Chandan Singh,et al.  A Gurmukhi script recognition system , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[8]  S.C. Hinds,et al.  A document skew detection method using run-length encoding and the Hough transform , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[9]  Bidyut Baran Chaudhuri,et al.  Skew Angle Detection of Digitized Indian Script Documents , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Bidyut Baran Chaudhuri,et al.  Compound character recognition by run-number-based metric distance , 1998, Electronic Imaging.

[11]  G. SIROMONEY,et al.  Computer recognition of printed Tamil characters , 1978, Pattern Recognit..

[12]  Norihiro Hagita,et al.  Automated entry system for printed documents , 1990, Pattern Recognit..

[13]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[14]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[15]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  R. Mahesh K. Sinha,et al.  Rule based contextual post-processing for devanagari text recognition , 1987, Pattern Recognit..

[17]  Bidyut Baran Chaudhuri,et al.  An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi) , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.