Multilingual OCR system for South Indian scripts and English documents: An approach based on Fourier transform and principal component analysis

Character recognition lies at the core of the discipline of pattern recognition where the aim is to represent a sequence of characters taken from an alphabet [Kasturi, R., Gorman, L.O., Govindaraju, V., 2002. Document image analysis: a primer. Sadhana 27 (Part 1), 3-22]. Though many kinds of features have been developed and their test performances on standard database have been reported, there is still room to improve the recognition rate by developing improved features. In this paper, we present a multilingual character recognition system for printed South Indian scripts (Kannada, Telugu, Tamil and Malayalam) and English documents. South Indian languages are most popular languages in India and around the world. The proposed multilingual character recognition is based on Fourier transform and principal component analysis (PCA), which are two commonly used techniques of image processing and recognition. PCA and Fourier transforms are classical feature extraction and data representation techniques widely used in the area of pattern recognition and computer vision. Our experimental results show the good performance over the data sets considered.

[1]  R. Seethalakshmi,et al.  Optical Character Recognition for printed Tamil text using Unicode , 2005 .

[2]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[3]  V. K. Govindan,et al.  Character recognition - A review , 1990, Pattern Recognit..

[4]  Bidyut Baran Chaudhuri,et al.  Automatic recognition of printed Oriya script , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[5]  Sargur N. Srihari,et al.  Off-Line Cursive Script Word Recognition , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Chellapilla Patvardhan,et al.  A multi-font OCR system for printed Telugu text , 2002, Language Engineering Conference, 2002. Proceedings.

[7]  C. V. Jawahar,et al.  A bilingual OCR for Hindi-Telugu documents and its applications , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8]  U. Pal,et al.  Recognition of printed Urdu script , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[9]  Ching Y. Suen,et al.  Historical review of OCR research and development , 1992, Proc. IEEE.

[10]  William I. Grosky,et al.  Negotiating the semantic gap: from feature maps to semantic landscapes , 2002, Pattern Recognit..

[11]  P. S. Sastry,et al.  A font and size-independent OCR system for printed Kannada documents using support vector machines , 2002 .

[12]  Madasu Hanmandlu,et al.  Unconstrained handwritten character recognition based on fuzzy logic , 2003, Pattern Recognit..

[13]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[14]  B. K. Panigrahi,et al.  ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE , 2010 .

[15]  Atul Negi,et al.  An OCR system for Telugu , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[16]  Veena Bansal,et al.  Segmentation of touching and fused Devanagari characters , 2002, Pattern Recognit..

[17]  Bidyut Baran Chaudhuri,et al.  Indian script character recognition: a survey , 2004, Pattern Recognit..

[18]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[20]  Bidyut Baran Chaudhuri,et al.  An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi) , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[21]  K. GovindanV.,et al.  Character recognitiona review , 1990 .

[22]  George Nagy,et al.  An Autonomous Reading Machine , 1968, IEEE Transactions on Computers.

[23]  Venu Govindaraju,et al.  Document image analysis: A primer , 2002 .