Multi-modal Subspace Learning with Joint Graph Regularization for Cross-Modal Retrieval

This paper investigates the problem of cross-modal retrieval, where users can search results across various modalities by submitting any modality of query. Since the query and its retrieved results can be of different modalities, how to measure the content similarity between different modalities of data remains a challenge. To address this problem, we propose a joint graph regularized multi-modal subspace learning (JGRMSL) algorithm, which integrates inter-modality similarities and intra-modality similarities into a joint graph regularization to better explore the cross-modal correlation and the local manifold structure in each modality of data. To obtain good class separation, the idea of Linear Discriminant Analysis (LDA) is incorporated into the proposed method by maximizing the between-class covariance of all projected data and minimizing the within-class covariance of all projected data. Experimental results on two public cross-modal datasets demonstrate the effectiveness of our algorithm.

[1]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  D. Jacobs,et al.  Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch , 2011, CVPR 2011.

[3]  Jagadeesh Jagarlamudi Generalization of CCA via Spectral Embedding , 2011 .

[4]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[5]  Kristen Grauman,et al.  Reading between the lines: Object localization using implicit cues from image tags , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Wen Gao,et al.  Face recognition based on non-corresponding region matching , 2011, 2011 International Conference on Computer Vision.

[8]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[9]  Mitesh M. Khapra,et al.  Improving the Multilingual User Experience of Wikipedia Using Cross-Language Name Search , 2010, NAACL.

[10]  Wei Wang,et al.  Continuum regression for cross-modal multimedia retrieval , 2012, 2012 19th IEEE International Conference on Image Processing.

[11]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[12]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Kristen Grauman,et al.  Accounting for the Relative Importance of Objects in Image Retrieval , 2010, BMVC.