Binarization effects on results of text-line segmentation methods applied on historical documents

Results of document scanning are digital images. Image processing is considered as bidimensional signal processing. Such images are the input of document analysis and recognition systems. Information extraction is one objective of document analysis and recognition systems. Textline extraction is a crucial step of such systems because its output is considered as the input of the recognition step. Most segmentation methods take as input binary images, which explains that binarization methods can affect segmentation results. We study in this paper how does the choice of binarization minimally affects the results of text-line segmentation methods? Several evaluation metrics are used for the comparison between segmentation results. The proposed approach is tested using the benchmarking databases IAM (about 556 images) and IAM historical (about 60 images). The results show that binarization affects the detection rate (DR) and the recognition accuracy (RA) metrics for segmentation evaluation.

[1]  Raymond W. Smith Hybrid Page Layout Analysis via Tab-Stop Detection , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[2]  Ioannis Pratikakis,et al.  ICDAR 2011 Document Image Binarization Contest (DIBCO 2011) , 2011, 2011 International Conference on Document Analysis and Recognition.

[3]  Volker Märgner,et al.  A design of a preprocessing framework for large database of historical documents , 2011, HIP '11.

[4]  Volker Märgner,et al.  New Binarization Approach Based on Text Block Extraction , 2011, 2011 International Conference on Document Analysis and Recognition.

[5]  Basilios Gatos,et al.  Handwriting Segmentation Contest , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[6]  Alicia Fornés,et al.  Transcription alignment of Latin manuscripts using hidden Markov models , 2011, HIP '11.

[7]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[8]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[9]  Basilios Gatos,et al.  ICFHR 2010 Handwriting Segmentation Contest , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[10]  Georgios Louloudis,et al.  ICDAR 2009 Handwriting Segmentation Contest , 2009, ICDAR.

[11]  Georgi Gluhchev,et al.  Handwritten document image segmentation and analysis , 1993, Pattern Recognit. Lett..

[12]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[13]  Shijian Lu,et al.  Binarization of historical document images using the local maximum and minimum , 2010, DAS '10.

[14]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[15]  Frank Lebourgeois,et al.  Networking digital document images , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[16]  George Nagy,et al.  Automated Evaluation of OCR Zoning , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Hamid Amiri,et al.  New method for the selection of binarization parameters based on noise features of historical documents , 2011, MOCR_AND '11.

[18]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[19]  Venu Govindaraju,et al.  Preprocessing of Low-Quality Handwritten Documents Using Markov Random Fields , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  G. Lorette,et al.  Advances in Handwriting and Drawing: a multidisciplinary approach , 1994 .

[21]  Hui Zhang,et al.  Image segmentation evaluation: A survey of unsupervised methods , 2008, Comput. Vis. Image Underst..

[22]  Thomas M. Breuel,et al.  Performance Evaluation and Benchmarking of Six-Page Segmentation Algorithms , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Hsi-Jian Lee,et al.  Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm , 1999, Pattern Recognit. Lett..