Modeling continuous visual features for semantic image annotation and retrieval

Automatic image annotation has become an important and challenging problem due to the existence of semantic gap. In this paper, we firstly extend probabilistic latent semantic analysis (PLSA) to model continuous quantity. In addition, corresponding Expectation-Maximization (EM) algorithm is derived to determine the model parameters. Furthermore, in order to deal with the data of different modalities in terms of their characteristics, we present a semantic annotation model which employs continuous PLSA and standard PLSA to model visual features and textual words respectively. The model learns the correlation between these two modalities by an asymmetric learning approach and then it can predict semantic annotation precisely for unseen images. Finally, we compare our approach with several state-of-the-art approaches on the Corel5k and Corel30k datasets. The experiment results show that our approach performs more effectively and accurately.

[1]  Daniel Gatica-Perez,et al.  Modeling Semantic Aspects for Cross-Media Image Indexing , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Y. Mori,et al.  Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .

[4]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[5]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Edward Y. Chang,et al.  CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines , 2003, IEEE Trans. Circuits Syst. Video Technol..

[8]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[9]  Thorsten Brants,et al.  Test Data Likelihood for PLSA Models , 2005, Information Retrieval.

[10]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Wei-Ying Ma,et al.  A Probabilistic Semantic Model for Image Annotation and Multi-Modal Image Retrieva , 2005, ICCV.

[12]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[14]  Wei-Ying Ma,et al.  A probabilistic semantic model for image annotation and multi-modal image retrieval , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[18]  Xi Liu,et al.  Automatic image annotation with continuous PLSA , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Jing Huang,et al.  Spatial Color Indexing and Applications , 2004, International Journal of Computer Vision.

[20]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[21]  Raimondo Schettini,et al.  Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.

[22]  Zhongzhi Shi,et al.  Modeling latent aspects for automatic image annotation , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[23]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[24]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[25]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[26]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.