Text Line Detection in Corrupted and Damaged Historical Manuscripts

Most of the algorithms proposed for text line detection are designed to process binary images as input. For severely degraded documents, binarization often introduces significant noise and other artifacts. In this work we present a novel method designed to detect text lines directly in gray scale images. The method consists of two stages. Potential characters are detected in the first stage. This is done by analyzing the evolution maps of connected components obtained by a sliding threshold. The detected potential characters are grouped into text lines in the second stage using sweep-line approach. The suggested method is especially powerful when applied to torn and damaged documents that other algorithms are not able to deal with.

[1]  Andreas Keller,et al.  Lexicon-free handwritten word spotting using character HMMs , 2012, Pattern Recognit. Lett..

[2]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[3]  Venu Govindaraju,et al.  Text extraction from gray scale historical document images using adaptive local connectivity map , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[4]  David S. Doermann,et al.  Learning Text-Line Segmentation Using Codebooks and Graph Partitioning , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[5]  Its'hak Dinstein,et al.  2009 10th International Conference on Document Analysis and Recognition Line segmentation for degraded handwritten historical documents , 2022 .

[6]  Angelika Garz,et al.  Binarization-Free Text Line Segmentation for Historical Documents Based on Interest Point Clustering , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[7]  Alireza Alaei,et al.  A new scheme for unconstrained handwritten text-line segmentation , 2011, Pattern Recognit..

[8]  Alicia Fornés,et al.  Transcription alignment of Latin manuscripts using hidden Markov models , 2011, HIP '11.

[9]  Yi Li,et al.  Script-Independent Text Line Segmentation in Freestyle Handwritten Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Georgios Louloudis,et al.  ICDAR 2009 Handwriting Segmentation Contest , 2009, ICDAR.

[11]  Yi Li,et al.  Detecting Text Lines in Handwritten Documents , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[12]  Alicia Fornés,et al.  On Influence of Line Segmentation in Efficient Word Segmentation in Old Manuscripts , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[13]  Fatos T. Yarman-Vural,et al.  Repulsive attractive network for baseline extraction on document images , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  William A. Barrett,et al.  Separating lines of text in free-form handwritten historical documents , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[15]  Nikos Papamarkos,et al.  Handwritten and Machine Printed Text Separation in Document Images Using the Bag of Visual Words Paradigm , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[16]  Sargur N. Srihari,et al.  A statistical approach to line segmentation in handwritten documents , 2007, Electronic Imaging.

[17]  Its'hak Dinstein,et al.  WebGT: An Interactive Web-Based System for Historical Document Ground Truth Generation , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[18]  Jihad El-Sana,et al.  Evolution Maps for Connected Components in Text Documents , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[19]  Jihad El-Sana,et al.  Text line segmentation for gray scale historical document images , 2011, HIP '11.

[20]  Volker Märgner,et al.  A Multilevel Text-Line Segmentation Framework for Handwritten Historical Documents , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[21]  Basilios Gatos,et al.  Handwriting Segmentation Contest , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[22]  Marcus Liwicki,et al.  On-Line Handwritten Text Line Detection Using Dynamic Programming , 2007 .