Document image retrieval based on texture features and similarity fusion

In this paper we investigate the usefulness of two different texture features along with classification fusion for document image retrieval. A local binary texture method, as a statistical approach, and a wavelet analysis technique, as a transform-based approach, are used for feature extraction and two feature vectors are obtained for every document image. The similarity distances between each of the two feature vectors extracted for a given query and the feature vectors extracted from the document images in the training step are computed separately. In order to use the properties of both features, a classifier fusion technique is then employed using a weighted average fusion of distance measures obtained in relation to each feature vector. The document images are finally ranked based on the greatest visual similarity to the query obtained from the fusion similarity measures. The Media Team Document Database, which provides a great variety of page layouts and contents, is considered for evaluating the proposed method. The results obtained from the experiments demonstrate a correct document retrieval of 65.4% and 91.8% in the Top-1 and Top-10 ranked document list, respectively.

[1]  Rafael C. González,et al.  Local Determination of a Moving Contrast Edge , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Véronique Eglin,et al.  Document page similarity based on layout visual saliency: application to query by example and document classification , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  A. Haar Zur Theorie der orthogonalen Funktionensysteme , 1910 .

[4]  Jean-Yves Ramel,et al.  A Proposition of Retrieval Tools for Historical Document Images Libraries , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[5]  Ernest Valveny,et al.  A kernel-based approach to document retrieval , 2010, DAS '10.

[6]  I KunchevaLudmila A Theoretical Study on Six Classifier Fusion Strategies , 2002 .

[7]  Jean-Pierre Antoine,et al.  Image analysis with two-dimensional continuous wavelet transform , 1993, Signal Process..

[8]  Alfred Haar On the Theory of Orthogonal Function Systems. * ) * * ) (first Communication.) , .

[9]  Matti Pietikäinen,et al.  View-based recognition of real-world textures , 2004, Pattern Recognit..

[10]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[11]  C. V. Jawahar,et al.  On Segmentation of Documents in Complex Scripts , 2007 .

[12]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Marko Heikkilä,et al.  A texture-based method for modeling the background and detecting moving objects , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Marcos X. Álvarez-Cid,et al.  Texture Description Through Histograms of Equivalent Patterns , 2012, Journal of Mathematical Imaging and Vision.

[15]  Marimuthu Palaniswami,et al.  A novel document retrieval method using the discrete wavelet transform , 2005, TOIS.

[16]  Chih-Fong Tsai On Classifying Digital Accounting Documents , 2007 .

[17]  Francesca Cesarini,et al.  Retrieval by Layout Similarity of Documents Represented with MXY Trees , 2002, Document Analysis Systems.

[18]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Xiaoyang Tan,et al.  Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions , 2007, IEEE Transactions on Image Processing.

[20]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[22]  Jayant Kumar,et al.  Structural similarity for document image classification and retrieval , 2014, Pattern Recognit. Lett..

[23]  Jules-Raymond Tapamo,et al.  A texture-based method for document segmentation and classification , 2006, South Afr. Comput. J..

[24]  Chang-qing Zhu,et al.  Study of remote sensing image texture analysis and classification using wavelet , 1998 .

[25]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Keinosuke Matsumoto,et al.  Document image retrieval based on 2D density distributions of terms with pseudo relevance feedback , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[27]  Chew Lim Tan,et al.  Imaged Document Text Retrieval Without OCR , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Jesús Francisco Vargas-Bonilla,et al.  Off-line Signature Verification Based on Gray Level Information Using Wavelet Transform and Texture Features , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[29]  Michal Strzelecki,et al.  Texture Analysis Methods - A Review , 1998 .

[30]  Thomas M. Breuel,et al.  Distance measures for layout-based document image retrieval , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[31]  Giovanni Soda,et al.  Exploring Digital Libraries with Document Image Retrieval , 2007, ECDL.

[32]  J. Macgregor,et al.  Image texture analysis: methods and comparisons , 2004 .

[33]  D J Rogers,et al.  A Computer Program for Classifying Plants. , 1960, Science.

[34]  Jilin Li,et al.  Document Image Retrieval with Local Feature Sequences , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[35]  Zhu Xizhi,et al.  The Application of Wavelet Transform in Digital Image Processing , 2008, 2008 International Conference on MultiMedia and Information Technology.

[36]  Marko Heikkilä,et al.  Description of interest regions with local binary patterns , 2009, Pattern Recognit..

[37]  Giovanni Soda,et al.  Digital Libraries and Document Image Retrieval Techniques: A Survey , 2011, Learning Structure and Schemas from Documents.

[38]  Trygve Randen,et al.  Segmentation of Text/image Documents Using Texture Approaches , 1994 .

[39]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..