论文信息 - A probabilistic semantic model for image annotation and multi-modal image retrieval

A probabilistic semantic model for image annotation and multi-modal image retrieval

This paper addresses automatic image annotation problem and its application to multi-modal image retrieval. The contribution of our work is three-fold. (1) We propose a probabilistic semantic model in which the visual features and the textual words are connected via a hidden layer which constitutes the semantic concepts to be discovered to explicitly exploit the synergy among the modalities. (2) The association of visual features and textual words is determined in a Bayesian framework such that the confidence of the association can be provided. (3) Extensive evaluation on a large-scale, visually and semantically diverse image collection crawled from Web is reported to evaluate the prototype system based on the model. In the proposed probabilistic model, a hidden concept layer which connects the visual feature and the word layer is discovered by fitting a generative model to the training image and annotation words through an Expectation-Maximization (EM) based iterative learning procedure. The evaluation of the prototype system on 17,000 images and 7736 automatically extracted annotation words from crawled Web pages for multi-modal image retrieval has indicated that the proposed semantic model and the developed Bayesian framework are superior to a state-of-the-art peer system in the literature.

[1] Marcel Worring,et al. Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2] James Ze Wang,et al. Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[3] Y. Mori,et al. Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[4] R. Manmatha,et al. A Model for Learning the Semantics of Pictures , 2003, NIPS.

[5] David A. Forsyth,et al. Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[6] Edward Y. Chang,et al. CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[7] Wei-Ying Ma,et al. VIPS: a Vision-based Page Segmentation Algorithm , 2003 .

[8] Thomas Hofmann,et al. Statistical Models for Co-occurrence Data , 1998 .

[9] Thijs Westerveld,et al. Experimental result analysis for a generative probabilistic image retrieval model , 2003, SIGIR.

[10] Zhongfei Zhang,et al. Exploiting the cognitive synergy between different media modalities in multimodal information retrieval , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[11] Wei-Ying Ma,et al. Multi-model similarity propagation and its application for web image retrieval , 2004, MULTIMEDIA '04.

[12] N. L. Johnson,et al. Multivariate Analysis , 1958, Nature.

[13] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14] Michael I. Jordan,et al. Unsupervised Learning from Dyadic Data , 1998 .

[15] David A. Forsyth,et al. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[16] Michael I. Jordan,et al. Modeling annotated data , 2003, SIGIR.

[17] R. Manmatha,et al. Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[18] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19] Ronald L. Wasserstein,et al. Monte Carlo: Concepts, Algorithms, and Applications , 1997 .

[20] Jorma Rissanen,et al. Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[21] Thomas Hofmann,et al. Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[22] A. P. deVries,et al. Experimental evaluation of a generative probabilistic image retrieval model on 'easy' data , 2003 .

[23] William I. Grosky,et al. Narrowing the semantic gap - improved text-based web document retrieval using visual features , 2002, IEEE Trans. Multim..