Character segmentation for multi lingual Indic and Roman scripts

Character segmentation has long been a critical area of the Optical Character Recognition. In this paper, we present an algorithm for character segmentation for Indic and Roman scripts. Character segmentation is difficult for Indic scripts because in these scripts characters are connected with the Shirorekha or headline and the regions bounding the two consecutive characters might overlap because of matraas. Horizontal projection profile is used to extract the Shirorekha and a vertical projection profile is used to segment the characters. Results of the algorithm for the scanned and facsimile documents for Devnagari Script are shown.

[1]  Bidyut Baran Chaudhuri,et al.  Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[2]  Bidyut Baran Chaudhuri,et al.  Automatic separation of words in multi-lingual multi-script Indian documents , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[3]  Rajiv Kumar,et al.  Detection and segmentation of lines and words in Gurmukhi handwritten text , 2010, 2010 IEEE 2nd International Advance Computing Conference (IACC).

[4]  Bidyut B. Chaudhuri,et al.  Segmentation of touching characters in printed Devnagari and Bangla scripts using fuzzy multifactorial analysis , 2002 .

[5]  Christopher Jones,et al.  An OCR-independent character segmentation using shortest-path in grayscale document images , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[6]  Zhang ruilin,et al.  Skew Detection and Correction Method of Fabric Images Based on Hough Transform , 2009, 2009 Second International Conference on Intelligent Computation Technology and Automation.

[7]  Yasuaki Nakano,et al.  Segmentation methods for character recognition: from segmentation to document structure analysis , 1992, Proc. IEEE.

[8]  S. Agaian,et al.  Restoration of semi-transparent blotches in damaged texts, manuscripts, and images through localized, logarithmic image enhancement , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[9]  Umapada Pal,et al.  Two-stage Approach for Word-wise Script Identification , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[10]  C. V. Jawahar,et al.  A bilingual OCR for Hindi-Telugu documents and its applications , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..