Devnagari document segmentation using histogram approach

Document segmentation is one of the critical phases in machine recognition of any language. Correct segmentation of individual symbols decides the accuracy of character recognition technique. It is used to decompose image of a sequence of characters into sub images of individual symbols by segmenting lines and words. Devnagari is the most popular script in India. It is used for writing Hindi, Marathi, Sanskrit and Nepali languages. Moreover, Hindi is the third most popular language in the world. Devnagari documents consist of vowels, consonants and various modifiers. Hence proper segmentation of Devnagari word is challenging. A simple histogram based approach to segment Devnagari documents is proposed in this paper. Various challenges in segmentation of Devnagari script are also discussed.

[1]  Srikanta Pal,et al.  Line and Word Segmentation Approach for Printed Documents , 2010 .

[2]  Nafiz Arica,et al.  An overview of character recognition focused on off-line handwriting , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[3]  Deepak Bagai,et al.  Skew angle detectionof a cursive handwritten Devanagari script character image. , 2013 .

[4]  Bidyut Baran Chaudhuri,et al.  Multi-skew detection of Indian script documents , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[5]  Vijay H. Mankar,et al.  A Review of Research on Devnagari Character Recognition , 2010, ArXiv.

[6]  U. Pal,et al.  Segmentation of Bangla unconstrained handwritten text , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[7]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[8]  V. Vijay Kumar,et al.  Segmentation of Printed Text in Devanagari Script and Gurmukhi Script , 2010 .

[9]  Jean Paul Frédéric Serra Morphological filtering: An overview , 1994, Signal Process..

[10]  V. H. Mankar,et al.  Contour Detection and Recovery through Bio-Medical Watermarking for Telediagnosis , 2010 .

[11]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[12]  Mahesh Viswanathan,et al.  A prototype document image analysis system for technical journals , 1992, Computer.

[13]  Chng Eng Siong,et al.  Motion Detection with Adaptive Background and Dynamic Thresholds , 2005, 2005 5th International Conference on Information Communications & Signal Processing.

[14]  Ching Y. Suen,et al.  Character Recognition Systems: A Guide for Students and Practitioners , 2007 .