Text Line Segmentation of Handwritten Documents in Hindi and English

Text line segmentation is a major task of handwritten document processing. In this paper we present a method to detect and segment unconstrained handwritten documents written in Hindi and English. Document image is first binarized and connected components are identified. Based on Hough lines the text lines are identified. Skew angle is determined by calculating the slope of the detected line and then the skewness is minimized. Segmentation is then performed and the result is refined by removing the noise which basically comprises components from adjacent lines. KeywordsHandwritten document, Hough lines, Text line segmentation, skew angle detection, connected component labeling, Hough peaks. __________________________________________________*****_________________________________________________

[1]  Aurélie Lemaitre,et al.  Text line extraction in handwritten document with Kalman filter applied on low resolution image , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[2]  Venu Govindaraju,et al.  Text extraction from gray scale historical document images using adaptive local connectivity map , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[3]  Jiang Yong,et al.  Text line extraction from multi-skewed handwritten documents , 2008, 2008 27th Chinese Control Conference.

[4]  Fei Yin,et al.  Handwritten text line extraction based on minimum spanning tree clustering , 2007, 2007 International Conference on Wavelet Analysis and Pattern Recognition.

[5]  Sargur N. Srihari,et al.  A statistical approach to line segmentation in handwritten documents , 2007, Electronic Imaging.

[6]  Yi Li,et al.  Detecting Text Lines in Handwritten Documents , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[7]  Zhixin Shi,et al.  A natural learning algorithm based on Hough transform for text lines extraction in handwritten documents , 1999 .