A new binarization method for degraded document images

The binarization of image is an important stage in any document analysis system such as OCR. It converts the colored or grayscale images into monochromatic form to reduce the computational complexity in the next stages. In old document images in the presence of degradations (ink bleed, stains, smear, non-uniform illumination, low contrast, etc.) the separation of foreground and background becomes a challenging task. Most of the existing binarization techniques can handle only a subset of these degradations. We present a simple binarization method for old document images. The experimental results confirm that the proposed technique gives good binarization results in the presence of various degradations. It computes the Laplacian of an image to separate the foreground. The subtracted Laplacian image is binarized using a global threshold. Finally, the postprocessing using morphological functions is applied. The results are compared in terms of F-measure, PSNR, time complexity, and OCR based evaluations which shows that our method outperforms existing techniques like Niblack, Sauvola, Gatos, Zhou, NICK, Singh, and Bataineh.

[1]  A.W.M. Smeulders,et al.  An introduction to image processing , 1991 .

[2]  Shijian Lu,et al.  Document image binarization using background estimation and stroke edges , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[3]  Jean-Michel Jolion,et al.  Extraction and recognition of artificial text in multimedia documents , 2003, Formal Pattern Analysis & Applications.

[4]  Nikolaos Ntogas,et al.  A binarization algorithm for historical manuscripts , 2008, ICC 2008.

[5]  Thierry Géraud,et al.  Efficient multiscale Sauvola’s binarization , 2013, International Journal on Document Analysis and Recognition (IJDAR).

[6]  Jérôme Darbon,et al.  Enhancement of historical printed document images by combining Total Variation regularization and Non-local Means filtering , 2011, Image Vis. Comput..

[7]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[8]  Sung-Il Chien,et al.  An improved binarization algorithm based on a water flow model for document image with inhomogeneous backgrounds , 2005, Pattern Recognit..

[9]  Anders Hast,et al.  Automatic Document Image Binarization using Bayesian Optimization , 2017, HIP@ICDAR.

[10]  Thierry Pun,et al.  A new method for grey-level picture thresholding using the entropy of the histogram , 1980 .

[11]  Ke Xiao,et al.  Adaptive uneven illumination correction method of document images , 2017, J. Comput. Methods Sci. Eng..

[12]  Sudipta Roy,et al.  A New Local Adaptive Thresholding Technique in Binarization , 2012, ArXiv.

[13]  Josef Kittler,et al.  On threshold selection using clustering criteria , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[14]  Shijian Lu,et al.  Combination of Document Image Binarization Techniques , 2011, 2011 International Conference on Document Analysis and Recognition.

[15]  Haitao Lu,et al.  Morphological Background Detection and Illumination Normalization of Text Image with Poor Lighting , 2014, PloS one.

[16]  Indu Sreedevi,et al.  Enhancement of ancient manuscript images by log based binarization technique , 2017 .

[17]  Derek Bradley,et al.  Adaptive Thresholding using the Integral Image , 2007, J. Graph. Tools.

[18]  Rahul Sharma,et al.  Adaptive binarization of severely degraded and non-uniformly illuminated documents , 2014, International Journal on Document Analysis and Recognition (IJDAR).

[19]  Håkan Grahn,et al.  Handwriting image enhancement using local learning windowing, Gaussian Mixture Model and k-means clustering , 2016, 2016 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[20]  Rae-Hong Park,et al.  Document image binarization based on topographic analysis using a water flow model , 2002, Pattern Recognit..

[21]  Andrew K. C. Wong,et al.  A new method for gray-level picture thresholding using the entropy of the histogram , 1985, Comput. Vis. Graph. Image Process..

[22]  Khairuddin Omar,et al.  An adaptive local binarization method for document images based on a novel thresholding method and dynamic windows , 2011, Pattern Recognit. Lett..

[23]  N.B. Rais,et al.  Adaptive thresholding technique for document image analysis , 2004, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[24]  Gueesang Lee,et al.  Stroke Width-Based Contrast Feature for Document Image Binarization , 2014, J. Inf. Process. Syst..

[25]  Shijian Lu,et al.  Robust Document Image Binarization Technique for Degraded Document Images , 2013, IEEE Transactions on Image Processing.

[26]  Thomas M. Breuel,et al.  Efficient implementation of local adaptive thresholding techniques using integral images , 2008, Electronic Imaging.

[27]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[28]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[29]  Sudipta Roy,et al.  Local Adaptive Automatic Binarisation (LAAB) , 2012 .

[30]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[31]  Khairuddin Omar,et al.  Adaptive binarization method for degraded document images based on surface contrast variation , 2015, Pattern Analysis and Applications.

[32]  Naouel Ouafek,et al.  A binarization method for degraded document image using artificial neural network and interpolation inpainting , 2018, 2018 4th International Conference on Optimization and Applications (ICOA).

[33]  S. Zhou,et al.  An Improved Adaptive Document Image Binarization Method , 2009, 2009 2nd International Congress on Image and Signal Processing.

[34]  Ehsanollah Kabir,et al.  An adaptive water flow model for binarization of degraded document images , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[35]  Nicole Vincent,et al.  Comparison of Niblack inspired binarization methods for ancient documents , 2009, Electronic Imaging.

[36]  N. Ikoma,et al.  Degraded document image binarization combining local statistics , 2009, 2009 ICCAS-SICE.

[37]  Qing Wang,et al.  A Stroke Width Based Parameter-Free Document Binarization Method , 2015, ICIG.

[38]  Jiangtao Wen,et al.  A new binarization method for non-uniform illuminated document images , 2013, Pattern Recognit..

[39]  Thierry Pun,et al.  Entropic thresholding, a new approach , 1981 .