Performance Evaluation and Benchmarking of Six Texture-Based Feature Sets for Segmenting Historical Documents

Recently, texture-based features have been used for digitized historical document image segmentation. It has been proven that these methods work effectively with no a priori knowledge. Moreover, it has been shown that they are robust when they are applied on degraded documents under different noise levels and types. In this paper an approach of evaluating texture-based feature sets for segmenting historical documents is presented in order to compare them. We aim at determining which texture features could be more adequate for segmenting graphical regions from textual ones on the one hand and for discriminating text in a variety of situations of different fonts and scales on the other hand. For this purpose, six well-known and widely used texture-based feature sets (autocorrelation function, Grey Level Co occurrence Matrix, Gabor filters, 3-level Haar wavelet transform, 3-level wavelet transform using 3-tap Daubechies filter and 3-level wavelet transform using 4-tap Daubechies filter) are evaluated and compared on a large corpus of historical documents. An additional insight into the computation time and complexity of each texture-based feature set is given. Qualitative and numerical experiments are also given to demonstrate each texture-based feature set performance.

[1]  Rémy Mullot,et al.  Texture feature evaluation for segmentation of historical document images , 2013, HIP '13.

[2]  Sheng-He Sun,et al.  Document image segmentation using Gabor wavelet and kernel-based methods , 2006, 2006 1st International Symposium on Systems and Control in Aerospace and Astronautics.

[3]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Ophir Frieder,et al.  Degraded document image enhancement , 2007, Electronic Imaging.

[5]  C.-C. Jay Kuo,et al.  Texture segmentation with tree-structured wavelet transform , 1992, [1992] Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis.

[6]  Jean-Yves Ramel,et al.  Document image characterization using a multiresolution analysis of the texture: application to old documents , 2008, International Journal of Document Analysis and Recognition (IJDAR).

[7]  Béla Julesz,et al.  Visual Pattern Discrimination , 1962, IRE Trans. Inf. Theory.

[8]  A. G. Ramakrishnan,et al.  Text Localization and Extraction from Complex Gray Images , 2006, ICVGIP.

[9]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[10]  Rama Chellappa,et al.  Multiscale Segmentation of Unstructured Document Pages Using Soft Decision Integration , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Po-Yueh Chen,et al.  DWT Based Text Localization , 2004 .

[12]  C. H. Chen,et al.  Handbook of Pattern Recognition and Computer Vision , 1993 .

[13]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[14]  Sridha Sridharan,et al.  Texture for script identification , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Maria Petrou,et al.  Image processing - dealing with texture , 2020 .

[16]  Anil K. Jain,et al.  Texture segmentation using Voronoi polygons , 1988, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  J. M. Hans du Buf,et al.  A review of recent texture segmentation and feature extraction techniques , 1993 .

[18]  Sunil Kumar,et al.  Text Extraction and Document Image Segmentation Using Matched Wavelets and MRF Model , 2007, IEEE Transactions on Image Processing.

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[20]  Sunil Kumar,et al.  Locating text in images using matched wavelets , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[21]  Shiv Dutt Joshi,et al.  Wavelet Based Page Segmentation , 2000 .

[22]  Rama Chellappa,et al.  Page segmentation using decision integration and wavelet packets , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[23]  Mausumi Acharyya,et al.  Document image segmentation using wavelet scale-space features , 2002, IEEE Trans. Circuits Syst. Video Technol..

[24]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  N. Lam,et al.  Wavelets for Urban Spatial Feature Discrimination: Comparisons with Fractal, Spatial Autocorrelation, and Spatial Co-Occurrence Approaches , 2004 .

[26]  I. Thusnavis Bella Mary,et al.  Content based image retrieval using textural features based on pyramid-structure wavelet transform , 2011, 2011 3rd International Conference on Electronics Computer Technology.

[27]  M. M. Kodabagi,et al.  A fuzzy approach for word level script identification of text in low resolution display board images using wavelet features , 2013, 2013 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[28]  Ying Liu,et al.  Automatic texture segmentation for texture-based image retrieval , 2004, 10th International Multimedia Modelling Conference, 2004. Proceedings..

[29]  D. Gabor,et al.  Theory of communication. Part 1: The analysis of information , 1946 .

[30]  Anil K. Jain,et al.  On texture in document images , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.