A Study of Different Kinds of Degradation in Printed Gurmukhi Script

The performance of any OCR system heavily depends upon printing quality of the input document. Many OCRs have been designed which correctly identify fine printed documents both in Indian and foreign scripts. But little reported work has been found on the recognition of the degraded documents. The performance of standard machine printed OCR system working for fine printed documents decreases, if it is tested on degraded documents. The degradation in any document can be of many types. In this paper, we have identified different kinds of degradation available in printed Gurmukhi script. After identifying the different kinds of degradation, problems associated with each kind of degradation have been discussed; some possible solutions have also been discussed. This paper is extremely useful for researchers engaged in recognizing the degraded documents in any script, because same kinds of degradation can be found in most of the scripts of the world

[1]  Chinmoy B. Bose,et al.  Connected and degraded text recognition using hidden Markov model , 1994, Pattern Recognit..

[2]  Berrin A. Yanikoglu,et al.  Pitch-based segmentation and recognition of dot-matrix text , 2000, International Journal on Document Analysis and Recognition.

[3]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Bidyut Baran Chaudhuri,et al.  Automatic recognition of printed Oriya script , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[5]  Haruo Asada,et al.  Resolving Ambiguity in Segmenting Touching Characters , 1992 .

[6]  Seong-Whan Lee,et al.  A new methodology for gray-scale character segmentation and recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[7]  Yi Lu,et al.  Machine printed character segmentation --; An overview , 1995, Pattern Recognit..

[8]  Tao Hong,et al.  Degraded text recognition using visual and linguistic context , 1996 .

[9]  Bidyut B. Chaudhuri,et al.  Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis , 2002 .

[10]  Javier Muguerza,et al.  Segmentation of low-quality typewritten digits , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[11]  Masami Oguro Faxed document image restoration using gray level representation , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[12]  Chandan Singh,et al.  Text segmentation of machine-printed Gurmukhi script , 2000, IS&T/SPIE Electronic Imaging.

[13]  Veena Bansal,et al.  Segmentation of touching and fused Devanagari characters , 2002, Pattern Recognit..

[14]  Laurel Harmon,et al.  An Accurate and Efficient System for Segmenting Machine-Printed Text , 2007 .

[15]  Noriyoshi Okamoto,et al.  A character segmentation algorithm for mixed-mode communication , 1985, Systems and Computers in Japan.

[16]  Chandan Singh,et al.  A Technique for Segmentation of Gurmukhi Text , 2001, CAIP.

[17]  Patrick Kelly,et al.  Quality assessment and restoration of typewritten document images , 1999, International Journal on Document Analysis and Recognition.

[18]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Osamu Nakamura,et al.  Character Segmentation for Mixed-Mode Communication , 1983, IFIP Congress.

[20]  Michael Droettboom Correcting broken characters in the recognition of historical printed documents , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[21]  Bidyut Baran Chaudhuri,et al.  Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[22]  Rajendra Kumar Sharma,et al.  Segmentation Problems and Solutions in Printed Degraded Gurmukhi Script , 2006 .

[23]  Richard M. Schwartz,et al.  Robust OCR of degraded documents , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).