Fast correction of bleed-through distortion in grayscale documents by a blind source separation technique

Ancient documents are usually degraded by the presence of strong background artifacts. These are often caused by the so-called bleed-through effect, a pattern that interferes with the main text due to seeping of ink from the reverse side. A similar effect, called show-through and due to the nonperfect opacity of the paper, may appear in scans of even modern, well-preserved documents. These degradations must be removed to improve human or automatic readability. For this purpose, when a color scan of the document is available, we have shown that a simplified linear pattern overlapping model allows us to use very fast blind source separation techniques. This approach, however, cannot be applied to grayscale scans. This is a serious limitation, since many collections in our libraries and archives are now only available as grayscale scans or microfilms. We propose here a new model for bleed-through in grayscale document images, based on the availability of the recto and verso pages, and show that blind source separation can be successfully applied in this case too. Some experiments with real-ancient documents arepresented and described.

[1]  Hirobumi Nishida,et al.  A multiscale approach to restoring scanned color document images with show-through effects , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[2]  Hirobumi Nishida,et al.  Correcting show-through effects on document images by multiscale analysis , 2002, Object recognition supported by user interaction for service robots.

[3]  Anna Tonazzini,et al.  Digital image analysis to enhance underwritten text in the Archimedes palimpsest , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[4]  Anna Tonazzini,et al.  Bleed-Through Removal from Degraded Documents Using a Color Decorrelation Method , 2004, Document Analysis Systems.

[5]  Gaurav Sharma,et al.  Show-through cancellation in scans of duplex printed documents , 2001, IEEE Trans. Image Process..

[6]  Patrick Dano,et al.  Joint restoration and compression of document images with bleed-through distortion , 2003 .

[7]  Anna Tonazzini,et al.  Independent component analysis for document restoration , 2004, Document Analysis and Recognition.

[8]  Venu Govindaraju,et al.  Separating text and background in degraded document images - a comparison of global thresholding techniques for multi-stage thresholding , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[9]  Andrzej Cichocki,et al.  Adaptive blind signal and image processing , 2002 .

[10]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[11]  Chew Lim Tan,et al.  Restoration of Archival Documents Using a Wavelet Technique , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[13]  T. Kanade,et al.  Color information for region segmentation , 1980 .

[14]  Anna Tonazzini,et al.  Blind Source Separation Techniques for Detecting Hidden Texts and Textures in Document Images , 2004, ICIAR.

[15]  Eric Dubois,et al.  Reduction of Bleed-through in Scanned Manuscript Documents , 2001, PICS.

[16]  Mario Köppen,et al.  A computer-based system to support forensic studies on handwritten documents , 2001, International Journal on Document Analysis and Recognition.