Historical Handwritten Document Image Segmentation Using Background Light Intensity Normalization

This paper presents a new document binarization algorithm for camera images of historical handwritten documents, which are especially found in The Library of Congress of the Unite States. The algorithm uses two background light intensity normalization algorithms to enhance the images before a local adaptive binarization algorithm is applied. The image normalization algorithms uses adaptive linear and non-linear functions to approximate the uneven background of the images due to the uneven surface of the document paper, aged color and light source of the cameras for image lifting. Our algorithms adaptively captures the background of a document image with a ”best fit” approximation. The document image is then normalized with respect to the approximation before a thresholding algorithm is applied. The technique works for both gray scale and color historical handwritten document images with significant improvement in readability for both human and OCR.

[1]  Chew Lim Tan,et al.  Matching of double-sided document images to remove interference , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[2]  Venu Govindaraju,et al.  Separating text and background in degraded document images - a comparison of global thresholding techniques for multi-stage thresholding , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[3]  Venu Govindaraju,et al.  Historical document image enhancement using background light intensity normalization , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[4]  Rafael Dueire Lins,et al.  Image segmentation of historical documents , 2000 .

[5]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[6]  Chew Lim Tan,et al.  Document image enhancement using directional wavelet , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  Rafael Dueire Lins,et al.  Generation of images of historical documents by composition , 2002, DocEng '02.