Noise characterization in ancient document images based on DCT coefficient distribution

Ancient document images date back to several hundred years are commonly suffered from noises and degradations, such as ink-seeping from the back page, `fox'; that is local-brown discolorations of paper, text fading, background spots, uneven background and so on. Noise reduction (or denoising) is an important step in document image processing, because the step can enhance the optical character recognition (OCR) performance. Prior to employing a noise reduction algorithm, it is important to characterize noise types exist in the document. This paper proposes a method to characterize noise types exist in ancient document based on the DCT coefficient distribution of the image. The characterization are accomplished by analyzing the standard deviation of distribution of DCT coefficient higher frequency-band of cropped (localized) noise image. In simulations, three noise types exist in Acehnese ancient documents namely `fox', spots, and uneven background are characterized using the proposed method. The results suggest that the DCT coefficient distributions can be used to characterize the noises in ancient document. In addition, it has been shown that the proposed method can be used for document image classification.

[1]  P. Yip,et al.  Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .

[2]  Christophe Charrier,et al.  Blind Image Quality Assessment: A Natural Scene Statistics Approach in the DCT Domain , 2012, IEEE Transactions on Image Processing.

[3]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[4]  Livio Tenze,et al.  Technique to correct yellowing and foxing in antique books , 2007 .

[5]  Edmund Y. Lam Analysis of the DCT coefficient distributions for document coding , 2004, IEEE Signal Processing Letters.

[6]  Joseph W. Goodman,et al.  A mathematical analysis of the DCT coefficient distributions for images , 2000, IEEE Trans. Image Process..

[7]  Rupinder Kaur,et al.  Review of Robust Document Image BINARIZATION Technique for Degraded Document Images , 2015 .

[8]  Véronique Eglin,et al.  Hermite and Gabor transforms for noise reduction and handwriting classification in ancient manuscripts , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[9]  Rafael Dueire Lins,et al.  Automatically detecting and classifying noises in document images , 2010, SAC '10.

[10]  Henry S. Baird,et al.  The State of the Art of Document Image Degradation Modelling , 2007 .

[11]  Debashis Ghosh,et al.  A Comparative Study of Different Approaches of Noise Removal for Document Images , 2011, SocProS.

[12]  Khairul Munadi,et al.  Improvement of binarization performance by applying DCT as pre-processing procedure , 2014, 2014 6th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[13]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[14]  Shijian Lu,et al.  Document image binarization using background estimation and stroke edges , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[15]  Mohamed Cheriet,et al.  Gabor Filters for Degraded Document Image Binarization , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[16]  Qiu Chen,et al.  Combined Histogram-based Features of DCT Coefficients in Low-frequency Domains for Face Recognition , 2012, ICSNC 2012.

[17]  Mandyam D. Srinath,et al.  Statistical distributions of image DCT coefficients , 1986 .

[18]  Khairul Munadi,et al.  IDENTIFICATION OF MOST SUITABLE BINARISATION METHODS FOR ACEHNESE ANCIENT MANUSCRIPTS RESTORATION SOFTWARE USER GUIDE , 2015 .

[19]  Peter D. Burns,et al.  Identification of image noise sources in digital scanner evaluation , 2003, IS&T/SPIE Electronic Imaging.