A non-stationary density model to separate overlapped texts in degraded documents

We address the problem of the removal of a text superimposed to a more important one, in a document image, considering the two instances of canceling back-to-front interferences from recto and verso images of archival documents and of recovering the erased text in palimpsests from multispectral images. Both problems are approached through a model where the ideal images of the two texts are considered as individual source patterns, mixed through some parametric operator. To cope with occlusions, ink saturation, and space variability of the mixing operator, a data model for this problem should be nonlinear and space variant. Here, we show that if a pointwise non-stationarity is allowed, a linear model can compensate for the lack of a suitable nonlinearity and for other modeling errors.

[1]  Nobuyuki Otsu,et al.  ATlreshold Selection Method fromGray-Level Histograms , 1979 .

[2]  Michael S. Brown,et al.  Ink-bleed reduction using functional minimization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Ioannis Pratikakis,et al.  Adaptive degraded document image binarization , 2006, Pattern Recognit..

[4]  Anil C. Kokaram,et al.  A Ground Truth Bleed-Through Document Image Database , 2012, TPDL.

[5]  Mohamed Cheriet,et al.  A Variational Approach to Degraded Document Enhancement , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Chew Lim Tan,et al.  Restoration of Archival Documents Using a Wavelet Technique , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Anna Tonazzini,et al.  Fast correction of bleed-through distortion in grayscale documents by a blind source separation technique , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[8]  Chew Lim Tan,et al.  Matching of double-sided document images to remove interference , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Anna Tonazzini,et al.  Nonlinear model identification and see-through cancelation from recto–verso data , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[10]  Anna Tonazzini,et al.  Independent component analysis for document restoration , 2004, Document Analysis and Recognition.

[11]  Gaurav Sharma,et al.  Show-through cancellation in scans of duplex printed documents , 2001, IEEE Trans. Image Process..

[12]  Farnood Merrikh-Bayat,et al.  Linear-quadratic blind source separating structure for removing show-through in scanned documents , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[13]  David G. Luenberger,et al.  Linear and Nonlinear Programming: Second Edition , 2003 .

[14]  Anil C. Kokaram,et al.  Bleed-through removal in degraded documents , 2012, Electronic Imaging.

[15]  Michael S. Brown,et al.  User-Assisted Ink-Bleed Reduction , 2010, IEEE Transactions on Image Processing.

[16]  Luís B. Almeida,et al.  Nonlinear separation of show-through image mixtures using a physical model trained with ICA , 2012, Signal Process..

[17]  Eric Dubois,et al.  Reduction of Bleed-through in Scanned Manuscript Documents , 2001, PICS.

[18]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[19]  Anna Tonazzini,et al.  Multichannel Blind Separation and Deconvolution of Images for Document Analysis , 2010, IEEE Transactions on Image Processing.

[20]  Farnood Merrikh-Bayat,et al.  Using Non-Negative Matrix Factorization for Removing Show-Through , 2010, LVA/ICA.

[21]  Boaz Ophir,et al.  Show-Through Cancellation in Scanned Images using Blind Source Separation Techniques , 2007, 2007 IEEE International Conference on Image Processing.

[22]  Anna Tonazzini,et al.  Removal of Non-Stationary See-Through Interferences from Recto-Verso Documents , 2013, MLDM Posters.

[23]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[24]  Anna Tonazzini,et al.  Nonlinear model and constrained ML for removing back-to-front interferences from recto-verso documents , 2012, Pattern Recognit..

[25]  Anna Tonazzini,et al.  Restoration of recto-verso archival documents through a regularized nonlinear model , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[26]  Anil C. Kokaram,et al.  A Non-parametric Framework for Document Bleed-through Removal , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.