Image retrieval using Markov Random Fields and global image features

In this paper, we propose a direct image retrieval framework based on Markov Random Fields (MRFs) that exploits the semantic context dependencies of the image. The novelty of our approach lies in the use of different kernels in our non-parametric density estimation together with the utilisation of configurations that explore semantic relationships among concepts at the same time as low-level features, instead of just focusing on correlation between image features like in previous formulations. Hence, we introduce several configurations and study which one achieve the best performance. Results are presented for two datasets, the usual benchmark Corel 5k and the collection proposed by the 2009 edition of the ImageCLEF campaign. We observe that, using MRFs, performance increases significantly depending on the kernel used in the density estimation for the two datasets. With respect to the the language model, best results are obtained for the configuration that exploits dependencies between words together with dependencies between words and visual features. For the Corel 5k dataset, our best result corresponds to a mean average precision of 0.32, which compares favourably with the highest value ever obtained, 0.35, achieved by Makadia et al. [22] albeit with different features. For the ImageCLEF09 collection, we obtained 0.32, as mean average precision.

[1]  R. Manmatha,et al.  Statistical models for automatic video annotation and retrieval , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Maosong Sun,et al.  Automatic Image Annotation Based on WordNet and Hierarchical Ensembles , 2006, CICLing.

[3]  Bin Wang,et al.  Dual cross-media relevance model for image annotation , 2007, ACM Multimedia.

[4]  Qi Zhang,et al.  Automatic image annotation by an iterative approach: incorporating keyword correlations and region matching , 2007, CIVR '07.

[5]  R. Manmatha,et al.  A discrete direct retrieval model for image and video retrieval , 2008, CIVR '08.

[6]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[7]  Ali Farhadi,et al.  Unlabeled data improvesword prediction , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Antonio Torralba,et al.  Statistics of natural image categories , 2003, Network.

[9]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[10]  R. Manmatha,et al.  Statistical models for text query-based image retrieval , 2008 .

[11]  J. A. Domínguez-Molina A practical procedure to estimate the shape parameter in the generalized Gaussian distribution , 2002 .

[12]  John C. Henderson,et al.  Direct Maximization of Average Precision by Hill-Climbing, with a Comparison to a Maximum Entropy Approach , 2004, HLT-NAACL.

[13]  Carlos Arturo Hernández-Gracidas,et al.  Markov Random Fields and Spatial Information to Improve Automatic Image Annotation , 2007, PSIVT.

[14]  Stefanie Nowak,et al.  Overview of the CLEF 2009 Large Scale - Visual Concept Detection and Annotation Task , 2009, CLEF.

[15]  Chong-Wah Ngo,et al.  A revisit of Generative Model for Automatic Image Annotation using Markov Random Fields , 2009, CVPR.

[16]  Luo Si,et al.  Effective automatic image annotation via a coherent language model and active learning , 2004, MULTIMEDIA '04.

[17]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[18]  Dan I. Moldovan,et al.  Exploiting ontologies for automatic image annotation , 2005, SIGIR '05.

[19]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[21]  Hugo Jair Escalante,et al.  Word Co-occurrence and Markov Random Fields for Improving Automatic Image Annotation , 2007, BMVC.

[22]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[23]  Wei-Ying Ma,et al.  An adaptive graph model for automatic image annotation , 2006, MIR '06.

[24]  Latifur Khan,et al.  Image annotations by combining multiple evidence & wordNet , 2005, ACM Multimedia.

[25]  Gustavo Carneiro,et al.  Formulating semantic image annotation as a supervised learning problem , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Stefan M. Rüger,et al.  Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation , 2005, CIVR.

[27]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[28]  R. Manmatha,et al.  An Inference Network Approach to Image Retrieval , 2004, CIVR.

[29]  Ali Farhadi,et al.  Unlabeled Data Improves Word Prediction , 2009 .

[30]  Chin-Hui Lee,et al.  Bayesian Learning of Hierarchical Multinomial Mixture Models of Concepts for Automatic Image Annotation , 2006, CIVR.

[31]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[32]  Stefan M. Rüger,et al.  Information-theoretic semantic multimedia indexing , 2007, CIVR '07.

[33]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[34]  Chin-Hui Lee,et al.  Enhancing image annotation by integrating concept ontology and text-based bayesian learning model , 2007, ACM Multimedia.

[35]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.