Texture feature evaluation for segmentation of historical document images

Texture feature analysis has undergone tremendous growth in recent years. It plays an important role for the analysis of many kinds of images. More recently, the use of texture analysis techniques for historical document image segmentation has become a logical and relevant choice in the conditions of significant document image degradation and in the context of lacking information on the document structure such as the document model and the typographical parameters. However, previous work in the use of texture analysis for segmentation of digitized historical document images has been limited to separately test one of the well-known texture-based approaches such as autocorrelation function, Grey Level Co-occurrence Matrix (GLCM), Gabor filters, gradient, wavelets, etc. In this paper we raise the question of which texture-based method could be better suited for discriminating on the one hand graphical regions from textual ones and on the other hand for separating textual regions with different sizes and fonts. The objective of this paper is to compare some of the well-known texture-based approaches: autocorrelation function, GLCM, and Gabor filters, used in a segmentation of digitized historical document images. Texture features are briefly described and quantitative results are obtained on simplified historical document images. The achieved results are very encouraging.

[1]  Maria Petrou,et al.  Image processing - dealing with texture , 2020 .

[2]  Jules-Raymond Tapamo,et al.  A texture-based method for document segmentation and classification , 2006, South Afr. Comput. J..

[3]  G. S. Peake,et al.  Script and language identification from document images , 1997, Proceedings Workshop on Document Image Analysis (DIA'97).

[4]  Tieniu Tan,et al.  Font Recognition Based on Global Texture Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  John P. Oakley,et al.  The Effect of Cluster Size , 1995 .

[6]  Sheng-He Sun,et al.  Document image segmentation using Gabor wavelet and kernel-based methods , 2006, 2006 1st International Symposium on Systems and Control in Aerospace and Astronautics.

[7]  Stéphane Bres Contributions a la quantification des criteres de transparence et d'anisotropie par une approche globale : application au controle de qualite de materiaux composites , 1994 .

[8]  Jean-Yves Ramel,et al.  Document image characterization using a multiresolution analysis of the texture: application to old documents , 2008, International Journal of Document Analysis and Recognition (IJDAR).

[9]  Tieniu Tan,et al.  Personal identification based on handwriting , 2000, Pattern Recognit..

[10]  Béla Julesz,et al.  Visual Pattern Discrimination , 1962, IRE Trans. Inf. Theory.

[11]  Mohammad Rahmati,et al.  A New Method for Writer Identification of Handwritten Farsi Documents , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[12]  Venu Govindaraju,et al.  Text - image separation in Devanagari documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[13]  Shijian Lu,et al.  Script and Language Identification in Noisy and Degraded Document Images , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  P. C. Saxena,et al.  The effect of cluster size, dimensionality, and number of clusters on recovery of true cluster structure through Chernoff-type faces , 1991 .

[15]  Anil K. Jain,et al.  Text segmentation using gabor filters for automatic document processing , 1992, Machine Vision and Applications.

[16]  C. H. Chen,et al.  Handbook of Pattern Recognition and Computer Vision , 1993 .

[17]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[18]  Jin Chen,et al.  Gabor features for offline Arabic handwriting recognition , 2010, DAS '10.

[19]  Frank Lebourgeois,et al.  Chromatic / Achromatic Separation in Noisy Document Images , 2011, 2011 International Conference on Document Analysis and Recognition.

[20]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[21]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[22]  P. Jannin,et al.  Anatomo-clinical atlases correlate clinical data and electrode contact coordinates: Application to subthalamic deep brain stimulation , 2013, Journal of Neuroscience Methods.

[23]  D. Gabor,et al.  Theory of communication. Part 1: The analysis of information , 1946 .

[24]  Muriel Visani,et al.  An experimental comparison of clustering methods for content-based indexing of large image databases , 2011, Pattern Analysis and Applications.

[25]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[26]  J. Douglas Armstrong,et al.  Merged consensus clustering to assess and improve class discovery with microarray data , 2010, BMC Bioinformatics.

[27]  David J. Ketchen,et al.  THE APPLICATION OF CLUSTER ANALYSIS IN STRATEGIC MANAGEMENT RESEARCH: AN ANALYSIS AND CRITIQUE , 1996 .

[28]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[29]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[30]  Rémy Mullot,et al.  Old document image segmentation using the autocorrelation function and multiresolution analysis , 2013, Electronic Imaging.

[31]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[32]  David S. Doermann,et al.  Gabor filter based multi-class classifier for scanned document images , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[33]  Anil K. Jain,et al.  Page segmentation using tecture analysis , 1996, Pattern Recognit..

[34]  Dennis Gabor,et al.  Theory of communication , 1946 .

[35]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[36]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[37]  Wilson S. Geisler,et al.  Multichannel Texture Analysis Using Localized Spatial Filters , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Sridha Sridharan,et al.  Texture for script identification , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  T. John Stonham,et al.  Document segmentation using texture analysis , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[40]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[41]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[42]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[43]  Cheng-Lin Liu,et al.  Gabor feature extraction for character recognition: comparison with gradient feature , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[44]  Lianwen Jin,et al.  A comparative study of gabor feature and gradient feature for handwritten chinese character recognition , 2007, 2007 International Conference on Wavelet Analysis and Pattern Recognition.

[45]  Véronique Eglin,et al.  Hermite and Gabor transforms for noise reduction and handwriting classification in ancient manuscripts , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[46]  Jan P. Allebach,et al.  Printer identification based on graylevel co-occurrence features for security and forensic applications , 2005, IS&T/SPIE Electronic Imaging.

[47]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[48]  Mickaël Coustaty,et al.  Stroke feature extraction for lettrine indexing , 2010, 2010 2nd International Conference on Image Processing Theory, Tools and Applications.