Adaptive Script-Independent Text Line Extraction

In this paper, an adaptive block-based text line extraction algorithm is proposed. Three global and two local parameters are defined to adapt the method to various handwritings in different languages. A document image is segmented into several overlapping blocks. The skew of each block is estimated. Text block is de-skewed by using the estimated skew angle. Text regions are detected in the de-skewed text block. A number of data points are extracted from the detected text regions in each block. These data points are used to estimate the paths of text lines. By thinning the background of the image including text line paths, text line boundaries or separators are estimated. Furthermore, an algorithm is proposed to assign to the extracted text lines the connected components which have intersections with the estimated separators. Extensive experiments on different standard datasets in various languages demonstrate that the proposed algorithm outperforms previous methods.

[1]  Karim Faez,et al.  FHT: An Unconstraint Farsi Handwritten Text Database , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[2]  U. Pal,et al.  Segmentation of Bangla unconstrained handwritten text , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  Basilios Gatos,et al.  Handwritten Text Line Segmentation by Shredding Text into its Lines , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[4]  Tianwen Zhang,et al.  Corpus-based HIT-MW database for offline recognition of general-purpose Chinese handwritten text , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[5]  Ioannis Pratikakis,et al.  Text line and word segmentation of handwritten documents , 2009, Pattern Recognit..

[6]  Yi Li,et al.  Script-Independent Text Line Segmentation in Freestyle Handwritten Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Georgios Louloudis,et al.  Text Line Detection in Unconstrained Handwritten Documents Using a Block-Based Hough Transform Approach , 2007 .

[8]  Hua Yang,et al.  Extraction of bibliography information based on image of book cover , 1999, Proceedings 10th International Conference on Image Analysis and Processing.

[9]  Vassilis Katsouros,et al.  Robust text-line and word segmentation for handwritten documents images , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Ioannis Pratikakis,et al.  Text line detection in handwritten documents , 2008, Pattern Recognit..

[11]  Umapada Pal,et al.  Handwriting segmentation of unconstrained Oriya text , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[12]  Vassilis Katsouros,et al.  Handwritten document image segmentation into text lines and words , 2010, Pattern Recognit..

[13]  Umapada Pal,et al.  Multioriented and curved text lines extraction from Indian documents , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  George D. C. Cavalcanti,et al.  Text Line Segmentation Based on Morphology and Histogram Projection , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[15]  Véronique Eglin,et al.  Text Lines and Snippets Extraction for 19th Century Handwriting Documents Layout Analysis , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[16]  Mahesh Viswanathan,et al.  A prototype document image analysis system for technical journals , 1992, Computer.

[17]  Bidyut Baran Chaudhuri,et al.  2009 10th International Conference on Document Analysis and Recognition Handwritten Text Line Identification In Indian Scripts , 2022 .

[18]  Sargur N. Srihari,et al.  A statistical approach to line segmentation in handwritten documents , 2007, Electronic Imaging.

[19]  Tonghua Su,et al.  Skew Detection for Chinese Handwriting by Horizontal Stroke Histogram , 2007 .

[20]  Tien D. Bui,et al.  Text line segmentation in handwritten documents using Mumford-Shah model , 2009, Pattern Recognit..

[21]  Kevin Chen,et al.  DOCLIB: a software library for document processing , 2006, Electronic Imaging.

[22]  Laurence Likforman-Sulem,et al.  Text Line Segmentation of Historical Arabic Documents , 2007 .

[23]  Fei Yin,et al.  Handwritten Chinese text line segmentation by clustering with distance metric learning , 2009, Pattern Recognit..

[24]  Venu Govindaraju,et al.  Line separation for complex document images using fuzzy runlength , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[25]  Subhadip Basu,et al.  Text line extraction from multi-skewed handwritten documents , 2007, Pattern Recognit..

[26]  Abderrazak Zahour,et al.  Arabic hand-written text-line extraction , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[27]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Karim Faez,et al.  An Adaptive Script-Independent Block-Based Text Line Extraction , 2010, 2010 20th International Conference on Pattern Recognition.

[29]  Elisabetta Bruzzone,et al.  An algorithm for extracting cursive text lines , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[30]  Majida Albakoor,et al.  Region growing based segmentation algorithm for typewritten and handwritten text recognition , 2009, Appl. Soft Comput..