Nonlinear model and constrained ML for removing back-to-front interferences from recto-verso documents

In this paper, we approach the removal of back-to-front interferences from scans of double-sided documents as a blind source separation problem, and extend our previous linear mixing model to a more effective nonlinear mixing model. We consider the front and back ideal images as two individual patterns overlapped in the observed recto and verso scans, and apply an unsupervised constrained maximum likelihood technique to separate them. Through several real examples, we show that the results obtained by this approach are much better than the ones obtained through data decorrelation or independent component analysis. As compared to approaches based on segmentation/classification, which often aim at cleaning a foreground text by removing all the textured background, one of the advantages of our method is that cleaning does not alter genuine features of the document, such as color or other structures it may contain. This is particularly interesting when the document has a historical importance, since its readability can be improved while maintaining the original appearance.

[1]  Frank Lebourgeois,et al.  Restoring Ink Bleed-Through Degraded Document Images Using a Recursive Unsupervised Classification Technique , 2006, Document Analysis Systems.

[2]  Eric Dubois,et al.  Reduction of Bleed-through in Scanned Manuscript Documents , 2001, PICS.

[3]  Patrick Dano,et al.  Joint restoration and compression of document images with bleed-through distortion , 2003 .

[4]  Hirobumi Nishida,et al.  A multiscale approach to restoring scanned color document images with show-through effects , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[5]  Anna Tonazzini,et al.  Registration and Enhancement of Double-Sided Degraded Manuscripts Acquired in Multispectral Modality , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[6]  Christian Wolf,et al.  Document Ink Bleed-Through Removal with Two Hidden Markov Random Fields and a Single Observation Field , 2010, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Chew Lim Tan,et al.  Document image enhancement using directional wavelet , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Mohamed Cheriet,et al.  A Variational Approach to Degraded Document Enhancement , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Farnood Merrikh-Bayat,et al.  A nonlinear blind source separation solution for removing the show-through effect in the scanned documents , 2008, 2008 16th European Signal Processing Conference.

[10]  Anna Tonazzini,et al.  Independent component analysis for document restoration , 2004, Document Analysis and Recognition.

[11]  Anna Tonazzini,et al.  Fast correction of bleed-through distortion in grayscale documents by a blind source separation technique , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[12]  Ali Mohammad-Djafari,et al.  Bayesian separation of document images with hidden markov model , 2007, VISAPP.

[13]  Hubert Emptoz,et al.  Serialized unsupervised classifier for adaptative color image segmentation: application to digitized ancient manuscripts , 2004, ICPR 2004.

[14]  Gaurav Sharma,et al.  Show-through cancellation in scans of duplex printed documents , 2001, IEEE Trans. Image Process..

[15]  Mohamed Cheriet,et al.  A Unified Framework Based on the Level Set Approach for Segmentation of Unconstrained Double-Sided Document Images Suffering from Bleed-Through , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[16]  Boaz Ophir,et al.  Show-Through Cancellation in Scanned Images using Blind Source Separation Techniques , 2007, 2007 IEEE International Conference on Image Processing.

[17]  Chew Lim Tan,et al.  Matching of double-sided document images to remove interference , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[18]  Anna Tonazzini,et al.  Multichannel Blind Separation and Deconvolution of Images for Document Analysis , 2010, IEEE Transactions on Image Processing.

[19]  R. F. Moghaddam,et al.  Low quality document image modeling and enhancement , 2009, International Journal of Document Analysis and Recognition (IJDAR).

[20]  Frank Lebourgeois,et al.  Serialized unsupervised classifier for adaptative color image segmentation: application to digitized ancient manuscripts , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..