Segmentation and Analysis of Double-Sided Handwritten Archival Documents

Historical handwritten documents are preserved in good condition in many national archives or libraries. One problem that many archivists are facing is the sipping of ink through the pages of certain double-sided handwritten documents after long periods of storage. This paper addresses this problem and develops a novel algorithm to extract clear textual images from interfering and overlapping areas. With the critical observation that the edges of the sipping strokes from the reverse side are not as sharp as those on the front side, we adopt the edge detection approach to suppress unwanted background patterns. Firstly, an improved Canny edge detector with edge orientation constraint is proposed. These improvements could link more weak foreground edges without introducing noises. Secondly, a new edge expansion model is presented for recovering broken edges of the words or characters on the front side. Finally, the outline of the whole document analysis system is illustrated. The segmentation results of real images are shown and evaluated.

[1]  Toyohide Watanabe,et al.  Character extraction from noisy background for an automatic reference system , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[2]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[4]  Majid Ahmadi,et al.  A Morphological Approach to Text String Extraction from Regular Periodic Overlapping Text/Background Images , 1994, CVGIP Graph. Model. Image Process..

[5]  Sanjoy K. Mitter,et al.  A hierarchical approach to high resolution edge contour reconstruction , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Anil K. Jain,et al.  Goal-Directed Evaluation of Binarization Methods , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Sargur N. Srihari,et al.  Document Image Binarization Based on Texture Features , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  J. M. White,et al.  Image Thresholding for Optical Character Recognition and Other Applications Requiring Character Image Extraction , 1983, IBM J. Res. Dev..

[9]  Volker Märgner,et al.  A General Approach to Quality Evaluation of Document Segmentation Results , 1998, Document Analysis Systems.