Cross-Modal Self-Taught Learning for Image Retrieval

In recent years, cross-modal methods have been extensively studied in the multimedia literature. Many existing cross-modal methods rely on labeled training data which is difficult to collect. In this paper we propose a cross-modal self-taught learning (CMSTL) algorithm which is learned from unlabeled multi-modal data. CMSTL adopts a two-stage self-taught scheme. In the multi-modal topic learning stage, both intra-modal similarity and multi-modal correlation are preserved. And different modalities have different weights to learn the mutli-modal topics. In the projection stage, soft assignment is used to learn projection functions. Experimental results on Wikipedia articles and NUS-WIDE show the effectiveness of CMSTL in both cross-modal retrieval and image hashing.

[1]  Yi Yang,et al.  Ranking with local regression and global alignment for cross media retrieval , 2009, ACM Multimedia.

[2]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[3]  Xiaohua Zhai,et al.  Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval , 2013, AAAI.

[4]  Jun Wang,et al.  Self-taught hashing for fast similarity search , 2010, SIGIR.

[5]  Nello Cristianini,et al.  Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis , 2002, NIPS.

[6]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[7]  Yansheng Lu,et al.  A semantic model for cross-modal and multi-modal retrieval , 2013, ICMR '13.

[8]  Kristen Grauman,et al.  Learning the Relative Importance of Objects from Tagged Images for Retrieval and Cross-Modal Search , 2011, International Journal of Computer Vision.

[9]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[11]  Ishwar K. Sethi,et al.  Multimedia content processing through cross-modal association , 2003, MULTIMEDIA '03.

[12]  Yueting Zhuang,et al.  A low rank structural large margin method for cross-modal ranking , 2013, SIGIR.

[13]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[14]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[15]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[16]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[17]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[18]  Jun Wang,et al.  Laplacian Co-hashing of Terms and Documents , 2010, ECIR.

[19]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.