Shirorekha extraction in Character Segmentation for printed devanagri text in Document Image Processing

Finding Structural Layout, Text Line Segmentation, Word Level Segmentation and Character Level Segmentation is major step in offline OCR systems for Devanagari Script in Document Image Processing. This paper proposes a Word and Character Segmentation method for machine printed Devanagari text. A complete word and character segmentation system for Devanagari printed text is presented here. Sometimes, interline space and fused characters make line segmentation and character segmentation a difficult task respectively. We have tested our method on documents in Marathi scripts. A novel technique of character segmentation for printed Devanagari text is presented here. After removing the Shirorekha (header line) of Devanagari text, the bounding boxes are used to surround the segmented characters. Results obtained from this method are encouraging because of morphological operations. In this method we are proposing some basic morphological operations on the scanned document images and got much better results.

[1]  B Shashidhara,et al.  Word Segmentation for Document Images by Successively Merging Adjacent Character Bounding Boxes by Iterative Dilation , 2012 .

[2]  Indu Kashyap,et al.  Survey and Analysis of Devnagari Character Recognition Techniques using Neural Networks , 2012 .

[3]  Priyanka Karmakar,et al.  Line and Word Segmentation of a Printed Text Document , 2014 .

[4]  Vijay H. Mankar,et al.  A Review of Research on Devnagari Character Recognition , 2010, ArXiv.

[5]  Vijay H. Mankar,et al.  Devnagari document segmentation using histogram approach , 2011, ArXiv.

[6]  Ravindra C. Thool,et al.  Brief review of research on Devanagari script , 2010 .

[7]  L M Jenila Livingston,et al.  Text Detection from documented Image Using Image Segmentation , 2013 .

[8]  Bidyut Baran Chaudhuri,et al.  An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi) , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[9]  Umapada Pal,et al.  Offline Recognition of Devanagari Script: A Survey , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[10]  Nikos A. Nikolaou,et al.  Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths , 2010, Image Vis. Comput..

[11]  Srikanta Pal,et al.  Line and Word Segmentation Approach for Printed Documents , 2010 .

[12]  Vassilis Katsouros,et al.  Handwritten document image segmentation into text lines and words , 2010, Pattern Recognit..