Restoration of Archival Documents Using a Wavelet Technique

This paper addresses a problem of restoring handwritten archival documents by recovering their contents from the interfering handwriting on the reverse side caused by the seeping of ink. We present a novel method that works by first matching both sides of a document such that the interfering strokes are mapped with the corresponding strokes originating from the reverse side. This facilitates the identification of the foreground and interfering strokes. A wavelet reconstruction process then iteratively enhances the foreground strokes and smears the interfering strokes so as to strengthen the discriminating capability of an improved Canny edge detector against the interfering strokes. The method has been shown to restore the documents effectively with average precision and recall rates for foreground text extraction at 84 percent and 96 percent, respectively.

[1]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Gaurav Sharma,et al.  Show-through cancellation in scans of duplex printed documents , 2001, IEEE Trans. Image Process..

[3]  Wayne Nilback An introduction to digital image processing , 1985 .

[4]  Stéphane Mallat,et al.  Characterization of Signals from Multiscale Edges , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Wen-Liang Hwang,et al.  Character extraction from documents using wavelet maxima , 1998, Image Vis. Comput..

[6]  K. Berkner,et al.  A new wavelet-based approach to sharpening and smoothing of images in Besov spaces with applications to deblurring , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[7]  Hon-Son Don,et al.  A noise attribute thresholding method for document image binarization , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[8]  Rama Chellappa,et al.  Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Chew Lim Tan,et al.  Segmentation and Analysis of Double-Sided Handwritten Archival Documents , 2004 .

[10]  Sargur N. Srihari,et al.  Document Image Binarization Based on Texture Features , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Majid Ahmadi,et al.  A Morphological Approach to Text String Extraction from Regular Periodic Overlapping Text/Background Images , 1994, CVGIP Graph. Model. Image Process..

[12]  J. M. White,et al.  Image Thresholding for Optical Character Recognition and Other Applications Requiring Character Image Extraction , 1983, IBM J. Res. Dev..

[13]  Jian Lu,et al.  Image deblocking via multiscale edge processing , 1996, Optics & Photonics.

[14]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[15]  Toyohide Watanabe,et al.  Character extraction from noisy background for an automatic reference system , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[16]  I. Johnstone,et al.  Threshold selection for wavelet shrinkage of noisy data , 1994, Proceedings of 16th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[17]  Chew Lim Tan,et al.  Removal of interfering strokes in double-sided document images , 2000, Proceedings Fifth IEEE Workshop on Applications of Computer Vision.

[18]  Gaurav Sharma Cancellation of show-through in duplex scanning , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[19]  Rainer Hoch,et al.  On the evaluation of document analysis components by recall, precision, and accuracy , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[20]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[21]  Dennis M. Healy,et al.  Contrast enhancement of medical images using multiscale edge representation , 1994, Defense, Security, and Sensing.

[22]  Eric Lecolinet,et al.  A Survey of Methods and Strategies in Character Segmentation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Yan Tang,et al.  A wavelet approach to extracting contours of document images , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).