Effect of "Ground Truth" on Image Binarization

Image binarization has a large effect on the rest of the document image analysis processes in character recognition. Algorithm development is still a major focus of research. Evaluation of image binarization has been done by comparison of the result of OCR systems on images binarized by different methods. That has been criticized in that the binarization alone is not evaluated, but rather how it interacts with the downstream processes. Recently pixel accurate "ground truth" images have been introduced for use in binarization algorithm evaluation. This has been shown to be open to interpretation. The choice of binarization ground truth affects the binarization algorithm design, either directly if design is by automated algorithm trying to match the provided ground truth, or indirectly if human designers adjust their designs to perform better on the provided data. Three variations in pixel accurate ground truth were used to train a binarization classifier. The performance can vary significantly depending on choice of ground truth, which can influence binarization design choices.

[1]  Henry S. Baird,et al.  Towards Versatile Document Analysis Systems , 2006, Document Analysis Systems.

[2]  Bin Chen,et al.  Recognition of handwritten Chinese characters via short line segments , 1992, Pattern Recognit..

[3]  Henry S. Baird,et al.  Versatile document image content extraction , 2006, Electronic Imaging.

[4]  Daniel P. Lopresti,et al.  Document Analysis Algorithm Contributions in End-to-End Applications: Report on the ICDAR 2011 Contest , 2011, 2011 International Conference on Document Analysis and Recognition.

[5]  Ioannis Pratikakis,et al.  ICDAR 2009 Document Image Binarization Contest (DIBCO 2009) , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[6]  Mohamed S. Kamel,et al.  Extraction of Binary Character/Graphics Images from Grayscale Document Images , 1993, CVGIP Graph. Model. Image Process..

[7]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[8]  Ioannis Pratikakis,et al.  An Objective Evaluation Methodology for Document Image Binarization Techniques , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[9]  Haiping Lu,et al.  Distance-reciprocal distortion measure for binary document images , 2004, IEEE Signal Processing Letters.

[10]  Ioannis Pratikakis,et al.  H-DIBCO 2010 - Handwritten Document Image Binarization Competition , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[11]  Elisa H. Barney Smith,et al.  An analysis of binarization ground truthing , 2010, DAS '10.

[12]  Henry S. Baird,et al.  High recall document content extraction , 2011, Electronic Imaging.

[13]  Ioannis Pratikakis,et al.  ICDAR 2011 Document Image Binarization Contest (DIBCO 2011) , 2011, 2011 International Conference on Document Analysis and Recognition.

[14]  Øivind Due Trier,et al.  Evaluation of Binarization Methods for Document Images , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Henry S. Baird,et al.  Document image content inventories , 2007, Electronic Imaging.