Text Line Detection in Unconstrained Handwritten Documents Using a Block-Based Hough Transform Approach

In this paper we present a new text line detection method for unconstrained handwritten documents. The proposed technique is based on a strategy that consists of three distinct steps. The first step includes image binarization and enhancement, connected component extraction and average character height estimation. In the second step, a block-based Hough transform is used for the detection of potential text lines while a third step is used to correct possible splitting, to detect text lines that the previous step did not reveal and, finally, to separate vertically connected characters and assign them to text lines. The performance evaluation of the proposed approach is based on a consistent and concrete evaluation methodology.

[1]  Elisabetta Bruzzone,et al.  An algorithm for extracting cursive text lines , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[2]  Ioannis Pratikakis,et al.  A segmentation-free approach for keyword search in historical typewritten documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[3]  Venu Govindaraju,et al.  Text extraction from gray scale historical document images using adaptive local connectivity map , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[4]  Bin Chen,et al.  Recognition of handwritten Chinese characters via short line segments , 1992, Pattern Recognit..

[5]  R. Manmatha,et al.  A scale space approach for automatically segmenting words from historical handwritten documents , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Venu Govindaraju,et al.  Line separation for complex document images using fuzzy runlength , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[7]  Basilios Gatos,et al.  ICDAR2005 page segmentation competition , 2007, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[8]  Laurence Likforman-Sulem,et al.  A Hough based algorithm for extracting text lines in handwritten documents , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[9]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[11]  Ihsin T. Phillips,et al.  Empirical Performance Evaluation of Graphics Recognition Systems , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Zhixin Shi,et al.  A natural learning algorithm based on Hough transform for text lines extraction in handwritten documents , 1999 .

[13]  Chun-Jen Chen,et al.  A linear-time component-labeling algorithm using contour tracing technique , 2004, Comput. Vis. Image Underst..