Restoring Ink Bleed-Through Degraded Document Images Using a Recursive Unsupervised Classification Technique

This paper presents a new method to restore a particular type of degradation related to ancient document images. This degradation, referred to as “bleed-through”, is due to the paper porosity, the chemical quality of the ink, or the conditions of digitalization. It appears as marks degrading the readability of the document image. Our purpose consists then in removing these marks to improve readability. The proposed method is based on a recursive unsupervised segmentation approach applied on the decorrelated data space by the principal component analysis. It generates a binary tree that only the leaves images satisfying a certain condition on their logarithmic histogram are processed. Some experiments, done on real ancient document images provided by the archives of “Chatillon-Chalaronne” illustrate the effectiveness of the suggested method.

[1]  Chew Lim Tan,et al.  Restoration of Archival Documents Using a Wavelet Technique , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Abdel Belaïd,et al.  Self-organizing Maps and Ancient Documents , 2004, Document Analysis Systems.

[3]  Gaurav Sharma Cancellation of show-through in duplex scanning , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[4]  Venu Govindaraju,et al.  Separating text and background in degraded document images - a comparison of global thresholding techniques for multi-stage thresholding , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[5]  Ioannis Pratikakis,et al.  An Adaptive Binarization Technique for Low Quality Historical Documents , 2004, Document Analysis Systems.

[6]  Andreas Dengel,et al.  Document Analysis Systems VI , 2004, Lecture Notes in Computer Science.

[7]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[8]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[9]  Henry S. Baird,et al.  The State of the Art of Document Image Degradation Modelling , 2007 .

[10]  Chew Lim Tan,et al.  Directional wavelet approach to remove document image interference , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[11]  Anna Tonazzini,et al.  Bleed-Through Removal from Degraded Documents Using a Color Decorrelation Method , 2004, Document Analysis Systems.

[12]  Eric Dubois,et al.  Reduction of Bleed-through in Scanned Manuscript Documents , 2001, PICS.

[13]  Frank Lebourgeois,et al.  Serialized k-Means for Adaptative Color Image Segmentation: Application to Document Images and Others , 2004, Document Analysis Systems.

[14]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.