Self Learning Classification for Degraded Document Images by Sparse Representation

Document Image Binarization is a technique to segment text out from the background region of a document image, which is a challenging task due to high intensity variations of the document foreground and background. Recently, a series of document image binarization contests (DIBCOs) had been held that have drawn great research interest in this area. Several document binarization techniques have been proposed and achieve great performance on the contest datasets. However, those proposed techniques may not perform well on all kinds of degraded document images because it is difficult to design a classification method that correctly models the non-uniform degraded document background and text foreground simultaneously. In this paper, we propose a self learning classification framework that combines binary outputs of different binarization methods. The proposed framework makes used of the sparse representation to re-classify the document pixels and produces a better binary results. The experimental results on the recent DIBCO contests show the great performance and robustness of our proposed framework on different kinds of degraded document images.

[1]  Frédéric Bouchara,et al.  Super-Resolved Binarization of Text Based on the FAIR Algorithm , 2011, 2011 International Conference on Document Analysis and Recognition.

[2]  Ioannis Pratikakis,et al.  H-DIBCO 2010 - Handwritten Document Image Binarization Competition , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[3]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[4]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ioannis Pratikakis,et al.  ICDAR 2009 Document Image Binarization Contest (DIBCO 2009) , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[6]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[7]  Nikos Papamarkos,et al.  Optimal combination of document binarization techniques using a self-organizing map neural network , 2007, Eng. Appl. Artif. Intell..

[8]  Ioannis Pratikakis,et al.  Improved document image binarization by using a combination of multiple binarization techniques and adapted edge information , 2008, 2008 19th International Conference on Pattern Recognition.

[9]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[10]  Ioannis Pratikakis,et al.  ICFHR 2012 Competition on Handwritten Document Image Binarization (H-DIBCO 2012) , 2012, 2012 International Conference on Frontiers in Handwriting Recognition.

[11]  Its'hak Dinstein,et al.  Binarization, character extraction, and writer identification of historical Hebrew calligraphy documents , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[12]  Nicholas R. Howe,et al.  A Laplacian Energy for Document Binarization , 2011, 2011 International Conference on Document Analysis and Recognition.

[13]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[14]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[15]  Ioannis Pratikakis,et al.  ICDAR 2011 Document Image Binarization Contest (DIBCO 2011) , 2011, 2011 International Conference on Document Analysis and Recognition.

[16]  Shijian Lu,et al.  Document image binarization using background estimation and stroke edges , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[17]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Shijian Lu,et al.  Combination of Document Image Binarization Techniques , 2011, 2011 International Conference on Document Analysis and Recognition.

[19]  Shijian Lu,et al.  Binarization of historical document images using the local maximum and minimum , 2010, DAS '10.

[20]  K. Mohiuddin International Conference On Document Analysis and Recognition , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.