Text Line Segmentation in Handwritten Documents Based on Connected Components Trajectory Generation

Text line segmentation in handwritten documents is an important step in many high level processing such as handwritten document enhancement and text recognition. In this paper we describe a novel approach of text line segmentation based on tracking. In this sense, we consider each connected component in the image as a moving object in its respective line and find its best match given its history motion, i.e. the closest connected component that lie in its trajectory. Direction of motion gives direction of handwritten text and is the output of our tracking algorithm. We apply our approach to images of ICDAR 2013 handwritten segmentation contest and report an overall detection rate of \(86.51\%\).

[1]  Hirotomo Aso,et al.  Extracting curved text lines using local linearity of the text line , 1999, International Journal on Document Analysis and Recognition.

[2]  Frank Dellaert,et al.  Multitarget tracking with split and merged measurements , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  Umapada Pal,et al.  Text line extraction in graphical documents using background and foreground information , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[4]  Youngjun Song,et al.  Extracting curved text lines using the chain composition and the expanded grouping method , 2008, Electronic Imaging.

[5]  Yi Li,et al.  Script-Independent Text Line Segmentation in Freestyle Handwritten Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Georgios Louloudis,et al.  Text Line Detection in Unconstrained Handwritten Documents Using a Block-Based Hough Transform Approach , 2007 .

[7]  Syed Saqib Bukhari,et al.  Segmentation of Curled Textlines Using Active Contours , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[8]  Apostolos Antonacopoulos,et al.  Handwriting Segmentation Contest , 2007, ICDAR.

[9]  Chew Lim Tan,et al.  Word and Sentence Extraction Using Irregular Pyramid , 2002, Document Analysis Systems.

[10]  Frank Hönes,et al.  Layout extraction of mixed mode documents , 2005, Machine Vision and Applications.

[11]  Bidyut Baran Chaudhuri,et al.  Multi-oriented English Text Line Identification , 2003, SCIA.

[12]  Abdelkrim Meziane,et al.  A Tracking Approach for Text Line Segmentation in Handwritten Documents , 2017, ICPRAM.

[13]  Ioannis Pratikakis,et al.  Segmentation Based Recovery of Arbitrarily Warped Document Images , 2007 .

[14]  Umapada Pal,et al.  Multioriented and curved text lines extraction from Indian documents , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).