A Technique for Segmentation of Gurmukhi Text

This paper describes a technique for text segmentation of machine printed Gurmukhi script documents. Research in the field of segmentation of Gurmukhi script faces major problems mainly related to the unique characteristics of the script like connectivity of characters on the headline, two or more characters in a word having intersecting minimum bounding rectangles, multi-component characters, touching characters which are present even in clean documents. The segmentation problems unique to the Gurmukhi script such as horizontally overlapping text segments and touching characters in various zonal positions in a word have been discussed in detail and a solution has been proposed.

[1]  Yi Lu,et al.  Machine printed character segmentation --; An overview , 1995, Pattern Recognit..

[2]  Veena Bansal Integrating Knowledge Sources in Devanagari Text Recognition , 1999 .

[3]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Bidyut Baran Chaudhuri,et al.  A complete printed Bangla OCR system , 1998, Pattern Recognit..

[5]  Chandan Singh,et al.  Text segmentation of machine-printed Gurmukhi script , 2000, IS&T/SPIE Electronic Imaging.

[6]  Majid Ahmadi,et al.  Segmentation of touching characters in printed document recognition , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[7]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..