Evaluating glyph binarizations based on their properties

Document binary images, created by different algorithms, are commonly evaluated based on a pre-existing ground truth. Previous research found several pitfalls in this methodology and suggested various approaches addressing the issue. This article proposes an alternative binarization quality evaluation solution for binarized glyphs, circumventing the ground truth. Our method relies on intrinsic properties of binarized glyphs. The features used for quality assessment are stroke width consistency, presence of small connected components (stains), edge noise, and the average edge curvature. Linear and tree-based combinations of these features are also considered. The new methodology is tested and shown to be nearly as sound as human experts' judgments.

[1]  Volker Märgner,et al.  A design of a preprocessing framework for large database of historical documents , 2011, HIP '11.

[2]  Ioannis Pratikakis,et al.  ICDAR 2009 Document Image Binarization Contest (DIBCO 2009) , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[3]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[4]  Zellig S. Harris,et al.  Lachish I. The Lachish Letters , 1938 .

[5]  Anil K. Jain,et al.  Goal-Directed Evaluation of Binarization Methods , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Øivind Due Trier,et al.  Evaluation of Binarization Methods for Document Images , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Thomas M. Breuel,et al.  Segmentation of handprinted letter strings using a dynamic programming algorithm , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[8]  Elisa H. Barney Smith,et al.  Effect of "Ground Truth" on Image Binarization , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[9]  Eli Turkel,et al.  Binarization of First Temple Period Inscriptions: Performance of Existing Algorithms and a New Registration Based Scheme , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[10]  Ioannis Pratikakis,et al.  H-DIBCO 2010 - Handwritten Document Image Binarization Competition , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[11]  Ioannis Pratikakis,et al.  ICDAR 2011 Document Image Binarization Contest (DIBCO 2011) , 2011, 2011 International Conference on Document Analysis and Recognition.

[12]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[14]  Eli Turkel,et al.  Quality Evaluation of Facsimiles of Hebrew First Temple Period Inscriptions , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[15]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Ioannis Pratikakis,et al.  An Objective Evaluation Methodology for Document Image Binarization Techniques , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[17]  Elisa H. Barney Smith,et al.  Edge noise in document images , 2009, AND '09.

[18]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[19]  Rafael Dueire Lins,et al.  ICFHR 2010 Contest: Quantitative Evaluation of Binarization Algorithms , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[20]  Elisa H. Barney Smith,et al.  An analysis of binarization ground truthing , 2010, DAS '10.

[21]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[22]  Ioannis Pratikakis,et al.  Performance Evaluation Methodology for Historical Document Image Binarization , 2013, IEEE Transactions on Image Processing.

[23]  Nikos Papamarkos,et al.  An Evaluation Technique for Binarization Algorithms , 2008, J. Univers. Comput. Sci..