Influence of text line segmentation in Handwritten Text Recognition

Text line segmentation is the process by which text lines in a document image are localized and extracted. It is an important step in off-line Handwritten Text Recognition (HTR) given that the input of these systems is the line image of the text to be transcribed. A myriad of solutions to the text line segmentation problem have been proposed in the literature. Although these solutions may differ greatly on what is actually applied to perform the segmentation, they can be classified by the level of precision and detail in the final extracted lines. In this paper we study the influence and real needs of different levels of precision and detail in the segmentation solutions in a real HTR task. We test three technics of text line segmentation whose output range from a simple rectangle for each line to a perfect fitted polygon surrounding the detected lines. Experiments have been carried out with a historical collection and results show that good HTR accuracy can be obtained with simple extraction algorithms.

[1]  Alireza Alaei,et al.  ICDAR 2013 Handwriting Segmentation Contest , 2009, 2013 12th International Conference on Document Analysis and Recognition.

[2]  Xi Zhang,et al.  Text Line Segmentation for Handwritten Documents Using Constrained Seam Carving , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[3]  Basilios Gatos,et al.  Segmentation of Historical Handwritten Documents into Text Zones and Text Lines , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[4]  Marcus Liwicki,et al.  On-Line Handwritten Text Line Detection Using Dynamic Programming , 2007 .

[5]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[6]  Hermann Ney,et al.  Moment-Based Image Normalization for Handwritten Text Recognition , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[7]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[8]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Apostolos Antonacopoulos,et al.  Handwriting Segmentation Contest , 2007, ICDAR.

[10]  Hermann Ney,et al.  Bootstrap estimates for confidence intervals in ASR performance evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Christopher Kermorvant,et al.  Automatic Line Segmentation and Ground-Truth Alignment of Handwritten Documents , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[12]  Ioannis Pratikakis,et al.  Text line and word segmentation of handwritten documents , 2009, Pattern Recognit..

[13]  Alejandro Héctor Toselli Rossi,et al.  Statistical Text Line Analysis in Handwritten Documents , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[14]  Oriol Ramos Terrades,et al.  Handwritten Line Detection via an EM Algorithm , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[15]  Verónica Romero,et al.  Handwritten text recognition for historical documents in the transcriptorium project , 2014, DATeCH '14.

[16]  Syed Saqib Bukhari,et al.  Towards Generic Text-Line Extraction , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[17]  Robert Sablatnig,et al.  Text Line Detection for Heterogeneous Documents , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[18]  Alejandro Héctor Toselli Rossi,et al.  Semiautomatic Text Baseline Detection in Large Historical Handwritten Documents , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[19]  Basilios Gatos,et al.  Handwriting Segmentation Contest , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[20]  Verónica Romero,et al.  On the Modification of Binarization Algorithms to Retain Grayscale Information for Handwritten Text Recognition , 2015, IbPRIA.

[21]  Lambert Schomaker,et al.  A Path Planning for Line Segmentation of Handwritten Documents , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.