2009 10th International Conference on Document Analysis and Recognition Handwritten Text Line Identification In Indian Scripts

Preprocessing in handwritten text OCR involves line, word and character segmentation. This paper deals with text line identification of handwritten Indian scripts, especially of Bangla, as well as English, Hindi, Malayalam, etc. Here, a new dual method based on interdependency between text-line and inter-line gap is proposed. The method draws curves simultaneously through the text and inter-line gap points found from strip-wise histogram peaks and inter-peak valleys. The curves start from left and move right while one type of points guides the curve of other type so that the curves do not intersect. Then these curves are allowed to iteratively evolve so that the text-line curves cross more character strokes while inter-line curves cross less character strokes and yet keep the curves as straight as possible. After several iterations, the curves stabilize and define the final text-lines and inter-line gaps. The approach works well on text of different scripts with various geometric layouts, including poetry.

[1]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Nina S. T. Hirata Document Processing via Trained Morphological Operators , 2007 .

[3]  Venu Govindaraju,et al.  Line separation for complex document images using fuzzy runlength , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[4]  R. Manmatha,et al.  A scale space approach for automatically segmenting words from historical handwritten documents , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[6]  Berrin A. Yanikoglu,et al.  Segmentation of off-line cursive handwriting using linear programming , 1998, Pattern Recognit..

[7]  Venu Govindaraju,et al.  Text extraction from gray scale historical document images using adaptive local connectivity map , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[8]  Aurélie Lemaitre,et al.  Text line extraction in handwritten document with Kalman filter applied on low resolution image , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[9]  Abderrazak Zahour,et al.  Arabic hand-written text-line extraction , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[10]  Jiang Yong,et al.  Text line extraction from multi-skewed handwritten documents , 2008, 2008 27th Chinese Control Conference.

[11]  Georgi Gluhchev,et al.  Handwritten document image segmentation and analysis , 1993, Pattern Recognit. Lett..

[12]  William A. Barrett,et al.  Separating lines of text in free-form handwritten historical documents , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[13]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[14]  David Doermann,et al.  A New Algorithm for Detecting Text Line in Handwritten Documents , 2006 .