Integrating knowledge sources in Devanagari text recognition system

The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role. This is the underlying philosophy of the Devanagari document recognition system described in this work. The knowledge sources we use are mostly statistical in nature or in the form of a word dictionary tailored specifically for optical character recognition (OCR). We do not perform any reasoning on these. However, we explore their relative importance and role in the hierarchy. Some of the knowledge sources are acquired a priori by an automated training process while others are extracted from the text as it is processed. A complete Devanagari OCR system has been designed and tested with real-life printed documents of varying size and font. Most of the documents used were photocopies of the original. A performance of approximately 90% correct recognition is achieved.

[1]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ishwar K. Sethi,et al.  Machine recognition of constrained hand printed devanagari , 1977, Pattern Recognit..

[3]  Nobuyasu Itoh,et al.  A spelling correction method and its application to an OCR system , 1990, Pattern Recognit..

[4]  Rajjan Shinghal A hybrid algorithm for contextual text recognition , 1983, Pattern Recognit..

[5]  Pan Bao-Chang,et al.  A METHOD OF RECOGNIZING HANDPRINTED CHARACTERS , 1989 .

[6]  Majid Ahmadi,et al.  Segmentation of touching characters in printed document recognition , 1994, Pattern Recognit..

[7]  Sargur N. Srihari,et al.  An Integrated Algorithm for Text Recognition: Comparison with a Cascaded Algorithm , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Allen R. Hanson,et al.  A Contextual Postprocessing System for Error Correction Using Binary n-Grams , 1974, IEEE Transactions on Computers.

[9]  Veena Bansal,et al.  Partitioning and searching dictionary for correction of optically read Devanagari character strings , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[10]  Gilles F. Houle,et al.  Hybrid Contextural Text Recognition with String Matching , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  R. Mahesh K. Sinha,et al.  Rule based contextual post-processing for devanagari text recognition , 1987, Pattern Recognit..

[12]  Bidyut Baran Chaudhuri,et al.  An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi) , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[13]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[14]  Veena Bansal,et al.  On how to describe shapes of Devanagari characters and use them for recognition , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[15]  Mehdi Hatamian,et al.  Optical character recognition by the method of moments , 1987 .

[16]  David R. Ferguson,et al.  Intelligent Forms Processing , 1990, IBM Syst. J..

[17]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Sargur N. Srihari,et al.  Integrating diverse knowledge sources in text recognition , 1982, TOIS.

[19]  Ulrich Kressel,et al.  Towards the Understanding of Printed Documents , 1992 .