Non-stationary modeling for the separation of overlapped texts in documents

In this paper, we address the removal of severe back-to-front interferences in archival documents, when recto and verso images of the page are available. The problem is approached from a modeling point of view, considering the ideal images of the two separated texts as individual source patterns that overlap in the observed images through some parametric mixing operator. Earlier approaches were based on linear mixtures of the ideal reflectance maps, or of the ideal optical densities and absorptance maps, through unknown coefficients or blur kernels. Some approximations and/or partial user supervision were then adopted to jointly estimate the sources and the model parameters. Nevertheless, a feasible and reliable data model for this problem should at least be non-linear and space-variant, to cope with occlusions, ink saturation, and large variability of the mixing level. This is especially true for ancient documents affected by ink seeping (bleed-through). The search for such a model is still far from being concluded, or even impossible to pursue, due to the unavailability of information about the chemical and physical processes at the origin of the phenomenon. Hence, here, we propose the use of pixel-dependent parameters, within a model additive in the optical densities, to compensate not only for non-stationarity, but also for the lack or the imprecise knowledge of the non-linearity, and for modeling errors more in general.

[1]  Boaz Ophir,et al.  Show-Through Cancellation in Scanned Images using Blind Source Separation Techniques , 2007, 2007 IEEE International Conference on Image Processing.

[2]  Anna Tonazzini,et al.  Restoration of recto-verso archival documents through a regularized nonlinear model , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[3]  Anna Tonazzini,et al.  Nonlinear model and constrained ML for removing back-to-front interferences from recto-verso documents , 2012, Pattern Recognit..

[4]  Eric Dubois,et al.  Reduction of Bleed-through in Scanned Manuscript Documents , 2001, PICS.

[5]  Anna Tonazzini,et al.  Nonlinear model identification and see-through cancelation from recto–verso data , 2012, International Journal on Document Analysis and Recognition (IJDAR).

[6]  Gaurav Sharma,et al.  Show-through cancellation in scans of duplex printed documents , 2001, IEEE Trans. Image Process..

[7]  Farnood Merrikh-Bayat,et al.  Linear-quadratic blind source separating structure for removing show-through in scanned documents , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[8]  Anil C. Kokaram,et al.  Bleed-through removal in degraded documents , 2012, Electronic Imaging.

[9]  Chew Lim Tan,et al.  Matching of double-sided document images to remove interference , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  Anna Tonazzini,et al.  Multichannel Blind Separation and Deconvolution of Images for Document Analysis , 2010, IEEE Transactions on Image Processing.

[11]  Farnood Merrikh-Bayat,et al.  Using Non-Negative Matrix Factorization for Removing Show-Through , 2010, LVA/ICA.

[12]  Mohamed Cheriet,et al.  A Variational Approach to Degraded Document Enhancement , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Anna Tonazzini,et al.  Fast correction of bleed-through distortion in grayscale documents by a blind source separation technique , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[14]  Chew Lim Tan,et al.  Restoration of Archival Documents Using a Wavelet Technique , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  N. Otsu A threshold selection method from gray level histograms , 1979 .