Historical document image binarization using background estimation and energy minimization

This paper presents an enhanced historical document image binarization technique that makes use of background estimation and energy minimization. Given a degraded historical document image, mathematical morphology is first carried out to compensate the document background with a disk-shaped mask, whose size is determined by the stroke width transform (SWT). The Laplacian energy based segmentation is then performed on the enhanced document image. Finally, the post-processing is further applied to improve the binarization results. The proposed technique has been extensively evaluated over the recent DIBCO and H-DIBCO benchmark datasets. Experimental results show that our proposed method outperforms other state-of-the-art document image binarization techniques.

[1]  Shijian Lu,et al.  A learning framework for degraded document image binarization using Markov Random Field , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[2]  Konstantinos Zagoris,et al.  ICDAR2017 Competition on Document Image Binarization (DIBCO 2017) , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[3]  Ehsanollah Kabir,et al.  An adaptive water flow model for binarization of degraded document images , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[4]  Hideaki Kawano,et al.  Text-Color-Independent Binarization for Degraded Document Image Based on MAP-MRF Approach , 2011, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[5]  Sébastien Eskenazi,et al.  A comprehensive survey of mostly textual document segmentation algorithms since 2008 , 2017, Pattern Recognit..

[6]  Sung-Il Chien,et al.  An improved binarization algorithm based on a water flow model for document image with inhomogeneous backgrounds , 2005, Pattern Recognit..

[7]  Carlos A. B. Mello,et al.  Parameter tuning for document image binarization using a racing algorithm , 2015, Expert Syst. Appl..

[8]  Shijian Lu,et al.  Binarization of historical document images using the local maximum and minimum , 2010, DAS '10.

[9]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Carlos A. B. Mello,et al.  A new thresholding algorithm for document images based on the perception of objects by distance , 2014, Integr. Comput. Aided Eng..

[11]  Ioannis Pratikakis,et al.  ICFHR 2012 Competition on Handwritten Document Image Binarization (H-DIBCO 2012) , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[12]  Marcel van Herk A fast algorithm for local minimum and maximum filters on rectangular and octagonal kernels , 1992, Pattern Recognit. Lett..

[13]  Rupinder Kaur,et al.  Review of Robust Document Image BINARIZATION Technique for Degraded Document Images , 2015 .

[14]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[15]  Ioannis Pratikakis,et al.  ICFHR2014 Competition on Handwritten Document Image Binarization (H-DIBCO 2014) , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[16]  Nikolaos Mitianoudis,et al.  Document image binarization using local features and Gaussian mixture modeling , 2015, Image Vis. Comput..

[17]  Ioannis Pratikakis,et al.  ICDAR 2013 Document Image Binarization Contest (DIBCO 2013) , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[18]  Pheng-Ann Heng,et al.  A double-threshold image binarization method based on edge detector , 2008, Pattern Recognit..

[19]  Konstantinos Zagoris,et al.  ICFHR2016 Handwritten Document Image Binarization Contest (H-DIBCO 2016) , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[20]  Nicholas R. Howe,et al.  Document binarization with automatic parameter tuning , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[21]  Nicholas R. Howe,et al.  A Laplacian Energy for Document Binarization , 2011, 2011 International Conference on Document Analysis and Recognition.

[22]  Robert Sablatnig,et al.  Binarization of MultiSpectral Document Images , 2015, CAIP.

[23]  Salvador España Boquera,et al.  Insights on the Use of Convolutional Neural Networks for Document Image Binarization , 2015, IWANN.

[24]  Hyung Jeong Yang,et al.  An MRF model for binarization of music scores with complex background , 2016, Pattern Recognit. Lett..

[25]  Ioannis Pratikakis,et al.  ICDAR 2011 Document Image Binarization Contest (DIBCO 2011) , 2011, 2011 International Conference on Document Analysis and Recognition.

[26]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Abdelkrim Meziane,et al.  A new efficient binarization method: application to degraded historical document images , 2017, Signal Image Video Process..

[28]  Shijian Lu,et al.  Document image binarization using background estimation and stroke edges , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[29]  Jean-Michel Jolion,et al.  Extraction and recognition of artificial text in multimedia documents , 2003, Formal Pattern Analysis & Applications.

[30]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[31]  Ioannis Pratikakis,et al.  H-DIBCO 2010 - Handwritten Document Image Binarization Competition , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[32]  Lazaros T. Tsochatzidis,et al.  ICDAR 2019 Competition on Document Image Binarization (DIBCO 2019) , 2017, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[33]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[34]  Ioannis Pratikakis,et al.  ICDAR 2009 Document Image Binarization Contest (DIBCO 2009) , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[35]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.