Historical document image segmentation using background light intensity normalization

This paper presents a new document binarization algorithm for camera images of historical documents, which are especially found in The Library of Congress of the United States. The algorithm uses a background light intensity normalization algorithm to enhance an image before a local adaptive binarization algorithm is applied. The image normalization algorithm uses an adaptive linear or non-linear function to approximate the uneven background of the image due to the uneven surface of the document paper, aged color or uneven light source of the cameras for image lifting. Our algorithm adaptively captures the background of a document image with a "best fit" approximation. The document image is then normalized with respect to the approximation before a thresholding algorithm is applied. The technique works for both gray scale and color historical handwritten document images with significant improvement in readability for both human and OCR.

[1]  Matti Pietikäinen,et al.  Adaptive document image binarization , 2000, Pattern Recognit..

[2]  Venu Govindaraju,et al.  Separating text and background in degraded document images - a comparison of global thresholding techniques for multi-stage thresholding , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[3]  Venu Govindaraju,et al.  Historical document image enhancement using background light intensity normalization , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[4]  Chew Lim Tan,et al.  Document image enhancement using directional wavelet , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[5]  Chew Lim Tan,et al.  Matching of double-sided document images to remove interference , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.