Enhancement of historical printed document images by combining Total Variation regularization and Non-local Means filtering

This paper proposes a novel method for document enhancement which combines two recent powerful noise-reduction steps. The first step is based on the Total Variation framework. It flattens background grey-levels and produces an intermediate image where background noise is considerably reduced. This image is used as a mask to produce an image with a cleaner background while keeping character details. The second step is applied to the cleaner image and consists of a filter based on Non-local Means: character edges are smoothed by searching for similar patch images in pixel neighborhoods. The document images to be enhanced are real historical printed documents from several periods which include several defects in their background and on character edges. These defects result from scanning, paper aging and bleed-through. The proposed method enhances document images by combining the Total Variation and the Non-local Means techniques in order to improve OCR recognition. The method is shown to be more powerful than when these techniques are used alone and than other enhancement methods.

[1]  Kuo-Chin Fan,et al.  Marginal noise removal of document images , 2002, Pattern Recognit..

[2]  Ken D. Sauer,et al.  A generalized Gaussian image model for edge-preserving MAP estimation , 1993, IEEE Trans. Image Process..

[3]  Elisa H. Barney Smith Characterization of image degradation caused by scanning , 1998, Pattern Recognit. Lett..

[4]  Kazem Taghva,et al.  UNLV-ISRI document collection for research in OCR and information retrieval , 1999, Electronic Imaging.

[5]  Bülent Sankur,et al.  Survey over image thresholding techniques and quantitative performance evaluation , 2004, J. Electronic Imaging.

[6]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[7]  Anna Tonazzini,et al.  Analysis and recognition of highly degraded printed characters , 2003, Document Analysis and Recognition.

[8]  David B. H. Tay,et al.  Enhancement of document images using multiresolution and fuzzy logic techniques , 1999, IEEE Signal Processing Letters.

[9]  Christian Wolf Improving recto document side restoration with an estimation of the verso side from a single scanned page , 2008, 2008 19th International Conference on Pattern Recognition.

[10]  Klara Kedem,et al.  Global and Local Shape Prior for Variational Segmentation of Degraded Historical Characters , 2008 .

[11]  Hiroshi Kawakami,et al.  Morphological preprocessing method to thresholding degraded word images , 2009, Pattern Recognit. Lett..

[12]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[13]  Gerhard Winkler,et al.  Image Analysis, Random Fields and Markov Chain Monte Carlo Methods: A Mathematical Introduction , 2002 .

[14]  Laurence Likforman-Sulem,et al.  Document Recognition and Retrieval XVII , 2007 .

[15]  Anna Tonazzini,et al.  Independent component analysis for document restoration , 2004, Document Analysis and Recognition.

[16]  Jérôme Darbon,et al.  Fast nonlocal filtering applied to electron cryomicroscopy , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[17]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[18]  Its'hak Dinstein,et al.  Adaptive shape prior for recognition and variational segmentation of degraded historical characters , 2009, Pattern Recognit..

[19]  Michael Droettboom Correcting broken characters in the recognition of historical printed documents , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[20]  Frank Lebourgeois,et al.  OCR Accuracy Improvement through a PDE-Based Approach , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[21]  Itay Bar-Yosef Input sensitive thresholding for ancient Hebrew manuscript , 2005 .

[22]  Mohamed Cheriet,et al.  A Unified Framework Based on the Level Set Approach for Segmentation of Unconstrained Double-Sided Document Images Suffering from Bleed-Through , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[23]  Michael Brady,et al.  Enhancement and feature extraction for images of incised and ink texts , 2004, Image Vis. Comput..

[24]  Jan-Erik Roos,et al.  A mathematical introduction , 1986 .

[25]  ANTONIN CHAMBOLLE,et al.  An Algorithm for Total Variation Minimization and Applications , 2004, Journal of Mathematical Imaging and Vision.

[26]  Jérôme Darbon,et al.  Image Restoration with Discrete Constrained Total Variation Part I: Fast and Exact Optimization , 2006, Journal of Mathematical Imaging and Vision.

[27]  Marianne Afifi,et al.  Joint Conference on Digital Libraries (JCDL) , 2003 .

[28]  Chung-Chu Leung,et al.  A new approach for image enhancement applied to low-contrast-low-illumination IC and document images , 2005, Pattern Recognit. Lett..

[29]  Anna Tonazzini,et al.  Registration and Enhancement of Double-Sided Degraded Manuscripts Acquired in Multispectral Modality , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[30]  Elisa H. Barney Smith,et al.  Pre-Processing of Degraded Printed Documents by Non-local Means and Total Variation , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[31]  Steven J. Simske,et al.  An optical character recognition approach to qualifying thresholding algorithms , 2008, DocEng '08.

[32]  Jean-Michel Morel,et al.  A Review of Image Denoising Algorithms, with a New One , 2005, Multiscale Model. Simul..

[33]  A. Chambolle Practical, Unified, Motion and Missing Data Treatment in Degraded Video , 2004, Journal of Mathematical Imaging and Vision.

[34]  Gerhard Winkler,et al.  Image analysis, random fields and dynamic Monte Carlo methods: a mathematical introduction , 1995, Applications of mathematics.

[35]  Ioannis Pratikakis,et al.  ICDAR 2009 Document Image Binarization Contest (DIBCO 2009) , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[36]  Yves Meyer,et al.  Oscillating Patterns in Image Processing and Nonlinear Evolution Equations: The Fifteenth Dean Jacqueline B. Lewis Memorial Lectures , 2001 .

[37]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  David S. Doermann,et al.  Binarization of low quality text using a Markov random field model , 2002, Object recognition supported by user interaction for service robots.

[39]  Rae-Hong Park,et al.  Document image binarization based on topographic analysis using a water flow model , 2002, Pattern Recognit..

[40]  Josef Kittler,et al.  Minimum error thresholding , 1986, Pattern Recognit..