Combination of Document Image Binarization Techniques

Document image binarization has been studied for decades, and many practical binarization techniques have been proposed for different kinds of document images. However, many state-of-the-art methods are particularly suitable for the document images that suffer from certain specific type of image degradation or have certain specific type of image characteristics. In this paper, we propose a classification framework to combine different thresholding methods and produce better performance for document image binarization. Given the binarization results of some reported methods, the proposed framework divides the document image pixels into three sets, namely, foreground pixels, background pixels and uncertain pixels. A classifier is then applied to iteratively classify those uncertain pixels into foreground and background, based on the pre-selected froeground and background sets. Extensive experiments over different datasets including the Document Image Binarization Contest(DIBCO)2009 and Handwritten Document Image Binarization Competition(H-DIBCO)2010 show that our proposed framework outperforms most state-of-the-art methods significantly.

[1]  Ioannis Pratikakis,et al.  ICDAR 2009 Document Image Binarization Contest (DIBCO 2009) , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[2]  Ioannis Pratikakis,et al.  Improved document image binarization by using a combination of multiple binarization techniques and adapted edge information , 2008, 2008 19th International Conference on Pattern Recognition.

[3]  Chew Lim Tan,et al.  Binarization of Badly Illuminated Document Images through Shading Estimation and Compensation , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[4]  Ioannis Pratikakis,et al.  H-DIBCO 2010 - Handwritten Document Image Binarization Competition , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[5]  Nikos Papamarkos,et al.  Optimal combination of document binarization techniques using a self-organizing map neural network , 2007, Eng. Appl. Artif. Intell..

[6]  Shijian Lu,et al.  A Self-Training Learning Document Binarization Framework , 2010, 2010 20th International Conference on Pattern Recognition.

[7]  Shijian Lu,et al.  Document image binarization using background estimation and stroke edges , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[8]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[9]  Ehsanollah Kabir,et al.  Binarization of degraded document image based on feature space partitioning and classification , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[10]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[11]  Yan Solihin,et al.  Integral Ratio: A New Class of Global Thresholding Techniques for Handwriting Images , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Wayne Niblack,et al.  An introduction to digital image processing , 1986 .

[13]  Cullen Jennings,et al.  Thresholding using an illumination model , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[14]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[15]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[16]  Rae-Hong Park,et al.  Document image binarization based on topographic analysis using a water flow model , 2002, Pattern Recognit..