A semantic model for cross-modal and multi-modal retrieval

In this paper, a semantic model for cross-modal and multi-modal retrieval is studied. We assume that the semantic correlation of multimedia data from different modalities can be depicted in a probabilistic generation framework. Media data from different modalities can be generated by the same semantic concepts, and the generation process of each media data is conditional independent under the semantic concepts. The semantic generation model (SGM) for cross-modal and multi-modal analysis is proposed based on this assumption. We study two types of methods: direct method Gaussian distribution and indirect method random forest, to estimate the semantic conditional distribution of SGM. Then methods for cross-modal and multi-modal retrieval are derived from SGM. Experimental results show that SGM based methods for cross-modal retrieval improve the accuracy over the state-of-the-art cross-modal method, but don't increase the time consuming, and the SGM multimodal retrieval methods also outperform traditional methods in image retrieval. Moreover, indirect SGM based method outperforms direct SGM method in the two types of retrieval, which proves that indirect SGM can better describe the semantic distribution.

[1]  C. V. Jawahar,et al.  Multi modal semantic indexing for image retrieval , 2010, CIVR '10.

[2]  Yi Yang,et al.  Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[3]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[4]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[5]  Yi Yang,et al.  Ranking with local regression and global alignment for cross media retrieval , 2009, ACM Multimedia.

[6]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[7]  Kristen Grauman,et al.  Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search , 2011, International Journal of Computer Vision.

[8]  Joo-Hwee Lim,et al.  Latent semantic fusion model for image retrieval and annotation , 2007, CIKM '07.

[9]  Rainer Lienhart,et al.  Multilayer pLSA for multimodal image retrieval , 2009, CIVR '09.

[10]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[11]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[12]  Yi Yang,et al.  Mining Semantic Correlation of Heterogeneous Multimedia Data for Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.