Text Line Segmentation in Images of Handwritten Historical Documents

This paper describes an original method to segment handwritten text lines from historical document images. After an initial preprocessing, we compute a black/white transition map to achieve a rough detection of the line regions in the image. Using this map, the corresponding line axes are extracted through a skeletonization algorithm and the conflicts between adjacent cutting lines are solved by some heuristics. Our approach was tested on a set of handwritten digitized documents (from the PROHIST Project database) from the end of the 19th century onwards. The proposed method worked well even with difficult images and it achieved an 82.18% of correct segmented lines for our database. The results of comparing our method with other recent proposal for automatic line extraction on the same test images offered more than a 38% of correct segmentation improvement.

[1]  Adriano Oliveira,et al.  PROHIST : An Environment for Image Processing of Historical Documents , 2007 .

[2]  Bin Zhang,et al.  Transcript mapping for historic handwritten document images , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[3]  Yi Li,et al.  Script-Independent Text Line Segmentation in Freestyle Handwritten Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jiang Yong,et al.  Text line extraction from multi-skewed handwritten documents , 2008, 2008 27th Chinese Control Conference.

[5]  Thierry Paquet,et al.  Text line segmentation in handwritten document using a production system , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[6]  William A. Barrett,et al.  Separating lines of text in free-form handwritten historical documents , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[7]  Jui-Chen Wu,et al.  Morphology-based text line extraction , 2007, Machine Vision and Applications.

[8]  Carlos A. B. Mello,et al.  Optical Digit Recognition for Images of Handwritten Historical Documents , 2006, 2006 Ninth Brazilian Symposium on Neural Networks (SBRN'06).

[9]  R. Manmatha,et al.  A scale space approach for automatically segmenting words from historical handwritten documents , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[11]  Carlos A. B. Mello,et al.  A new Tsallis entropy-based thresholding algorithm for images of historical documents , 2007, DocEng '07.

[12]  Subhadip Basu,et al.  Text line extraction from multi-skewed handwritten documents , 2007, Pattern Recognit..

[13]  Ioannis Pratikakis,et al.  A Block-Based Hough Transform Mapping for Text Line Detection in Handwritten Documents , 2006 .

[14]  Ángel Sánchez,et al.  An efficient gray-level thresholding algorithm for historic document images , 2008 .

[15]  Ching Y. Suen,et al.  A fast parallel algorithm for thinning digital patterns , 1984, CACM.

[16]  David Doermann,et al.  A New Algorithm for Detecting Text Line in Handwritten Documents , 2006 .