A Recursive Approach For Bleed-Through removal

Historical documents are valuable resources worth to be preserved in order to support our cultural and social knowledge. Unfortunately, these supports based on fragile materials are often affected by several types of degradations. Applying restoration techniques on degraded captured digital images of historical documents may be a quick and efficient way to preserve the document and avoid the loss in its content. This paper presents a new method to restore a particular type of degradation which is referred to as "bleed-through". This degradation is caused by the interference of characters from the reverse side with the text to be read. Our proposed method is based on a recursive approach that relies on two types of analysis: the Principal Component Analysis and the k-means clustering algorithm. The aim here is to extract clear textural images from these interfering and overlapping areas of text. Our restoration method analyses the front side image alone and corrects the unneeded image components. This paper concludes with some experimental results that demonstrate the effectiveness of our proposed method.

[1]  Frank Lebourgeois,et al.  Serialized k-Means for Adaptative Color Image Segmentation: Application to Document Images and Others , 2004, Document Analysis Systems.

[2]  Chew Lim Tan,et al.  Restoration of Archival Documents Using a Wavelet Technique , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Ioannis Pratikakis,et al.  An Adaptive Binarization Technique for Low Quality Historical Documents , 2004, Document Analysis Systems.

[4]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[5]  Anna Tonazzini,et al.  Bleed-Through Removal from Degraded Documents Using a Color Decorrelation Method , 2004, Document Analysis Systems.

[6]  Abdel Belaïd,et al.  Self-organizing Maps and Ancient Documents , 2004, Document Analysis Systems.

[7]  Chew Lim Tan,et al.  Directional wavelet approach to remove document image interference , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[8]  Gaurav Sharma Cancellation of show-through in duplex scanning , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[9]  Venu Govindaraju,et al.  Separating text and background in degraded document images - a comparison of global thresholding techniques for multi-stage thresholding , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[10]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[11]  Eric Dubois,et al.  Reduction of Bleed-through in Scanned Manuscript Documents , 2001, PICS.