Combining Link and Content Correlation Learning for Cross-Modal Retrieval in Social Multimedia

With the rapid growth of multimedia data, cross-modal retrieval has received great attention. Generally, learning semantics correlation is the primary solution for eliminating heterogeneous gap between modalities. Existing approaches usually focus on modeling cross-modal correlation and category correlation, which can’t capture semantic correlation thoroughly for social multimedia data. In fact, the diverse link information is complementary to provide rich hints for semantic correlation. In this paper, we propose a novel cross-modal correlation learning approach based on subspace learning by taking heterogeneous social link and content information into account. Both intra-modal and inter-modal correlation are simultaneously considered through explicitly modeling link information. Additionally, those correlations are incorporated into final representation, which further improve the performance of cross modal retrieval effectively. Experimental results demonstrate that the proposed approach performs better comparing with several state-of-the-art cross-modal correlation learning approaches.

[1]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[2]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[3]  Thomas Stibor,et al.  Toward Artificial Synesthesia: Linking Images and Sounds via Words , 2010 .

[4]  Trevor Darrell,et al.  Learning cross-modality similarity for multinomial data , 2011, 2011 International Conference on Computer Vision.

[5]  Wei Wang,et al.  Learning Coupled Feature Spaces for Cross-Modal Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Changsheng Xu,et al.  Social Multimedia Ming: From Special to General , 2016, 2016 IEEE International Symposium on Multimedia (ISM).

[7]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[8]  Gang Wang,et al.  Reinforced Similarity Integration in Image-Rich Information Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[9]  Xiaohua Zhai,et al.  Semi-Supervised Cross-Media Feature Learning With Unified Patch Graph Regularization , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[11]  Lei Huang,et al.  Cross-media retrieval by exploiting fine-grained correlation at entity level , 2017, Neurocomputing.

[12]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[13]  Ishwar K. Sethi,et al.  Multimedia content processing through cross-modal association , 2003, MULTIMEDIA '03.

[14]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[15]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.