Text line detection in handwritten documents

In this paper, we present a new text line detection method for handwritten documents. The proposed technique is based on a strategy that consists of three distinct steps. The first step includes image binarization and enhancement, connected component extraction, partitioning of the connected component domain into three spatial sub-domains and average character height estimation. In the second step, a block-based Hough transform is used for the detection of potential text lines while a third step is used to correct possible splitting, to detect text lines that the previous step did not reveal and, finally, to separate vertically connected characters and assign them to text lines. The performance evaluation of the proposed approach is based on a consistent and concrete evaluation methodology.

[1]  Chun-Jen Chen,et al.  A linear-time component-labeling algorithm using contour tracing technique , 2004, Comput. Vis. Image Underst..

[2]  Laurence Likforman-Sulem,et al.  A Hough based algorithm for extracting text lines in handwritten documents , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[3]  William A. Barrett,et al.  Separating lines of text in free-form handwritten historical documents , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[4]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[6]  Ihsin T. Phillips,et al.  Empirical Performance Evaluation of Graphics Recognition Systems , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Ioannis Pratikakis,et al.  A segmentation-free approach for keyword search in historical typewritten documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[8]  R. Manmatha,et al.  A scale space approach for automatically segmenting words from historical handwritten documents , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  C. Weliwitage,et al.  Handwritten Document Offline Text Line Segmentation , 2005, Digital Image Computing: Techniques and Applications (DICTA'05).

[10]  Subhadip Basu,et al.  Text line extraction from multi-skewed handwritten documents , 2007, Pattern Recognit..

[11]  Yi Li,et al.  Detecting Text Lines in Handwritten Documents , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[12]  Venu Govindaraju,et al.  Text extraction from gray scale historical document images using adaptive local connectivity map , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[13]  Elisabetta Bruzzone,et al.  An algorithm for extracting cursive text lines , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[14]  Zhixin Shi,et al.  A natural learning algorithm based on Hough transform for text lines extraction in handwritten documents , 1999 .

[15]  Apostolos Antonacopoulos,et al.  ICDAR 2009 Page Segmentation Competition , 2003, 2009 10th International Conference on Document Analysis and Recognition.

[16]  Thierry Paquet,et al.  Text line segmentation in handwritten document using a production system , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[17]  Bin Chen,et al.  Recognition of handwritten Chinese characters via short line segments , 1992, Pattern Recognit..

[18]  Apostolos Antonacopoulos,et al.  Handwriting Segmentation Contest , 2007, ICDAR.

[19]  Aurélie Lemaitre,et al.  Text line extraction in handwritten document with Kalman filter applied on low resolution image , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[20]  Venu Govindaraju,et al.  Line separation for complex document images using fuzzy runlength , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[21]  Basilios Gatos,et al.  ICDAR2005 page segmentation competition , 2007, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[22]  Laurence Likforman-Sulem,et al.  Text Line Segmentation of Historical Arabic Documents , 2007 .