Automatic Document Image Binarization

Document image binarization is often a challenging task due to various forms of degradation. Although there exist several binarization techniques in literature, the binarized image is typically sensitive to control parameter settings of the employed technique. This paper presents an automatic document image binarization algorithm to segment the text from heavily degraded document images. The proposed technique uses a two band-pass filtering approach for background noise removal, and Bayesian optimization for automatic hyperparameter selection for optimal results. The effectiveness of the proposed binarization technique is empirically demonstrated on the Document Image Binarization Competition (DIBCO) and the Handwritten Document Image Binarization Competition (H-DIBCO) datasets.

[1]  Mohamed Cheriet,et al.  Unsupervised Ensemble of Experts (EoE) Framework for Automatic Binarization of Document Images , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[2]  Carlos A. B. Mello,et al.  Parameter tuning for document image binarization using a racing algorithm , 2015, Expert Syst. Appl..

[3]  Shijian Lu,et al.  Binarization of historical document images using the local maximum and minimum , 2010, DAS '10.

[4]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[5]  Shijian Lu,et al.  A learning framework for degraded document image binarization using Markov Random Field , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[6]  Ronen Basri,et al.  Direct visibility of point sets , 2007, ACM Trans. Graph..

[7]  Prashant Singh,et al.  Design of experiments for model-based optimization , 2016 .

[8]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[9]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[10]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[11]  Rupinder Kaur,et al.  Review of Robust Document Image BINARIZATION Technique for Degraded Document Images , 2015 .

[12]  Ioannis Pratikakis,et al.  H-DIBCO 2010 - Handwritten Document Image Binarization Competition , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[13]  Frédéric Bouchara,et al.  Super-Resolved Binarization of Text Based on the FAIR Algorithm , 2011, 2011 International Conference on Document Analysis and Recognition.

[14]  Mohamed Cheriet,et al.  Historical Document Binarization Based on Phase Information of Images , 2012, ACCV Workshops.

[15]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[16]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[17]  Ioannis Pratikakis,et al.  ICDAR 2011 Document Image Binarization Contest (DIBCO 2011) , 2011, 2011 International Conference on Document Analysis and Recognition.

[18]  Nicholas R. Howe,et al.  A Laplacian Energy for Document Binarization , 2011, 2011 International Conference on Document Analysis and Recognition.

[19]  Carlos A. B. Mello,et al.  A new thresholding algorithm for document images based on the perception of objects by distance , 2014, Integr. Comput. Aided Eng..

[20]  Konstantinos Zagoris,et al.  ICFHR2016 Handwritten Document Image Binarization Contest (H-DIBCO 2016) , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[21]  Nicholas R. Howe,et al.  Document binarization with automatic parameter tuning , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[22]  Bernard Besserer,et al.  EFFICIENT RESTORATION OF VARIABLE AREA SOUNDTRACKS , 2011 .

[23]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[24]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[25]  Nikos Papamarkos,et al.  Estimation of proper parameter values for document binarization , 2008 .

[26]  Ioannis Pratikakis,et al.  ICDAR 2013 Document Image Binarization Contest (DIBCO 2013) , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[27]  Alan C. Bovik,et al.  No-Reference Image Quality Assessment in the Spatial Domain , 2012, IEEE Transactions on Image Processing.

[28]  Shijian Lu,et al.  Document image binarization using background estimation and stroke edges , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[29]  Mohamed Cheriet,et al.  A multi-scale framework for adaptive binarization of degraded document images , 2010, Pattern Recognit..

[30]  Ahmed Bouridane,et al.  A Set of Geometrical Features for Writer Identification , 2012, ICONIP.

[31]  Ioannis Pratikakis,et al.  ICDAR 2009 Document Image Binarization Contest (DIBCO 2009) , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[32]  Josef Kittler,et al.  Minimum error thresholding , 1986, Pattern Recognit..

[33]  Ioannis Pratikakis,et al.  ICFHR2014 Competition on Handwritten Document Image Binarization (H-DIBCO 2014) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[34]  Mohamed Cheriet,et al.  A learning framework for the optimization and automation of document binarization methods , 2013, Comput. Vis. Image Underst..

[35]  Amer Dawoud,et al.  Iterative Cross Section Sequence Graph for Handwritten Character Segmentation , 2007, IEEE Transactions on Image Processing.

[36]  Ioannis Pratikakis,et al.  ICFHR 2012 Competition on Handwritten Document Image Binarization (H-DIBCO 2012) , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[37]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.