Document Bleed-Through Removal Using Sparse Image Inpainting

Bleed-through is a pervasive degradation in ancient documents, caused by the ink of the opposite side of the sheet that has seeped through the paper fiber, and appears as an extra, interfering text. Bleed-through severely impairs document readability and makes it difficult to decipher the contents. Digital image restoration techniques have been successfully employed to remove or significantly reduce this distortion. The main theme is to identify the bleedthrough pixels and estimate an appropriate replacement for them, in accordance to their surrounding. This paper proposes a two-step image restoration method, exploiting information from the recto and verso images. First, based on a non-stationary linear model of the two texts overlapped in the recto-verso pair, the bleed through pixels are identified. In the second step, a sparse representation based image inpainting technique, with a non-negative sparsity constraint, is used to find an appropriate replacement for the bleedthough pixels. Thanks to the power of dictionary learning and sparse image reconstruction methods, the natural texture of the background is well reproduced in the bleed-through areas, and even a their possible overestimation is effectively corrected, so that the original appearance of the document is preserved. The experiments are conducted on the images of a popular database of ancient documents, and the results validate the performance of the proposed method compared to the state of the art.

[1]  Chew Lim Tan,et al.  Restoration of Archival Documents Using a Wavelet Technique , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[3]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Anna Tonazzini,et al.  A non-stationary density model to separate overlapped texts in degraded documents , 2015, Signal Image Video Process..

[5]  Anil C. Kokaram,et al.  A Ground Truth Bleed-Through Document Image Database , 2012, TPDL.

[6]  Mohamed Cheriet,et al.  A Variational Approach to Degraded Document Enhancement , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Xiao-Ping Zhang,et al.  Blind Bleed-Through Removal for Scanned Historical Document Image With Conditional Random Fields , 2015, IEEE Transactions on Image Processing.

[8]  Christine Guillemot,et al.  Image Inpainting : Overview and Recent Advances , 2014, IEEE Signal Processing Magazine.

[9]  Anna Tonazzini,et al.  Fast correction of bleed-through distortion in grayscale documents by a blind source separation technique , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[10]  Pascal Frossard,et al.  Dictionary Learning , 2011, IEEE Signal Processing Magazine.

[11]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[12]  Stephen J. Wright,et al.  Computational Methods for Sparse Solution of Linear Inverse Problems , 2010, Proceedings of the IEEE.

[13]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[14]  Guillermo Sapiro,et al.  Navier-stokes, fluid dynamics, and image and video inpainting , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[15]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[16]  Frank Lebourgeois,et al.  Restoring Ink Bleed-Through Degraded Document Images Using a Recursive Unsupervised Classification Technique , 2006, Document Analysis Systems.

[17]  Anna Tonazzini,et al.  An inpainting technique based on regularization to remove bleed-through from ancient documents , 2016, 2016 International Workshop on Computational Intelligence for Multimedia Understanding (IWCIM).

[18]  Patrick Pérez,et al.  Region filling and object removal by exemplar-based image inpainting , 2004, IEEE Transactions on Image Processing.

[19]  Anil C. Kokaram,et al.  A Non-parametric Framework for Document Bleed-through Removal , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[21]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[22]  Miki Haseyama,et al.  Image inpainting based on sparse representations with a perceptual metric , 2013, EURASIP J. Adv. Signal Process..

[23]  Michael S. Brown,et al.  Ink-bleed reduction using functional minimization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Mike E. Davies,et al.  Fast Non-Negative Orthogonal Matching Pursuit , 2015, IEEE Signal Processing Letters.

[25]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[26]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[27]  Anna Tonazzini,et al.  Independent component analysis for document restoration , 2004, Document Analysis and Recognition.