A Method for Document Image Binarization based on Histogram Matching and Repeated Contrast Enhancement

In this paper, a new method for binarization of document images is introduced. During training, the method stores histograms from training images, along with the optimal binarization threshold. Training images are presented in pairs, one noisy version and one clean binarized version, where the latter is used for finding the optimal binarization threshold. During use, the method matches the stored histograms to the histogram for the image that is to be binarized. If a sufficiently close match is found, the image is binarized using the corresponding threshold associated with the stored histogram. If no match is found, the contrast of the image is slightly enhanced, and a new attempt is made. This sequence is repeated until either a match is found, or a (rare) timeout is reached. The method has been applied to a set of test images, and has been shown to outperform several comparable methods.

[1]  Nobuyuki Otsu,et al.  ATlreshold Selection Method fromGray-Level Histograms , 1979 .

[2]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[3]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[4]  Ehsanollah Kabir,et al.  An adaptive water flow model for binarization of degraded document images , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[5]  Chin-Chen Chang,et al.  Efficient illumination compensation techniques for text images , 2012, Digit. Signal Process..

[6]  Nilanjan Ray,et al.  Pattern Recognition Letters , 1995 .

[7]  Nikos Papamarkos,et al.  An Evaluation Technique for Binarization Algorithms , 2008, J. Univers. Comput. Sci..

[8]  Luis Miguel Bergasa,et al.  A text reading algorithm for natural images , 2013, Image Vis. Comput..

[9]  Michael Werman,et al.  The Quadratic-Chi Histogram Distance Family , 2010, ECCV.

[10]  Jiri Matas,et al.  A Method for Text Localization and Recognition in Real-World Images , 2010, ACCV.

[11]  Shijian Lu,et al.  Document image binarization using background estimation and stroke edges , 2010, International Journal on Document Analysis and Recognition (IJDAR).