A methodology for the separation of foreground/background in Arabic historical manuscripts using hybrid methods

This paper presents a new color document image segmentation system suitable for historical Arabic manuscripts. Our system is composed of a hybrid method which couple together background light intensity normalization algorithm and k-means clustering with maximum likelihood (ML) estimation, for foreground/background separation. Firstly, the background normalization algorithm performs separation between foreground and background. This foreground is used in later steps. Secondly, our algorithm proceeds on luminance and distort the contrast. These distortions are corrected with a gamma correction and contrast adjustment. Finally, the new enhanced foreground image is segmented to foreground/background on the basis of ML estimation. The initial parameters for the ML method are estimated by k-means clustering algorithm. The segmented image is used to produce a final restored document image. The techniques are tested on a set of Arabic historical manuscripts documents from the National Tunisian Library. The performance of the algorithm is demonstrated on by real color manuscripts distorted with show-through effects, uneven background color and localized spot.

[1]  Frank Lebourgeois,et al.  Serialized k-Means for Adaptative Color Image Segmentation: Application to Document Images and Others , 2004, Document Analysis Systems.

[2]  Zhixin Shi,et al.  Digital Image Enhancement using Normalization Techniques and their application to Palm Leaf Manuscripts , 2005 .

[3]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[4]  Utpal Garain,et al.  On foreground — background separation in low quality document images , 2005, International Journal of Document Analysis and Recognition (IJDAR).

[5]  Utpal Garain,et al.  On foreground-background separation in low quality color document images , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[6]  Yan Chen,et al.  Comparison of some thresholding algorithms for text/background segmentation in difficult document images , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[7]  Rafael Dueire Lins,et al.  Image segmentation of historical documents , 2000 .

[8]  Josef Kittler,et al.  Minimum error thresholding , 1986, Pattern Recognit..

[9]  V. Govindaraju,et al.  Digital Enhancement of Palm Leaf Manuscript Images using Normalization Techniques , 2004 .

[10]  Andrew K. C. Wong,et al.  A new method for gray-level picture thresholding using the entropy of the histogram , 1985, Comput. Vis. Graph. Image Process..

[11]  Venu Govindaraju,et al.  Historical document image enhancement using background light intensity normalization , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[12]  Jing Li Wang,et al.  Color image segmentation: advances and prospects , 2001, Pattern Recognit..

[13]  Venu Govindaraju,et al.  Separating text and background in degraded document images - a comparison of global thresholding techniques for multi-stage thresholding , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[14]  Venu Govindaraju,et al.  Historical document image segmentation using background light intensity normalization , 2005, IS&T/SPIE Electronic Imaging.

[15]  Yoshua Bengio,et al.  High quality document image compression with "DjVu" , 1998, J. Electronic Imaging.

[16]  Rafael Dueire Lins,et al.  Generation of images of historical documents by composition , 2002, DocEng '02.

[17]  Chew Lim Tan,et al.  Document image enhancement using directional wavelet , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..