Efficient cross-modal retrieval via flexible supervised collective matrix factorization hashing

Cross-modal retrieval has recently drawn much attention in multimedia analysis, and it is still a challenging topic mainly attributes to its heterogeneous nature. In this paper, we propose a flexible supervised collective matrix factorization hashing (FS-CMFH) to efficient cross-modal retrieval. First, we exploit a flexible collective matrix factorization framework to jointly learn the individual latent space of similar semantic with respected to each modality. Meanwhile, the label consistency across different modalities is simultaneously exploited to preserve both intra-modal and inter-modal semantics within these similar latent semantic spaces. Accordingly, these two ingredients are formulated as a joint graph regularization term in an overall objective function, through which the similar hash codes of different modalities in an instance can be discriminatively obtained to flexibly characterize such instance. As a result, these derived hash codes incorporating higher discrimination power are able to improve the cross-modal searching accuracy significantly. The extensive experiments tested on three popular benchmark datasets show that the proposed approach performs favorably compared to the state-of-the-art competing approaches.

[1]  D. Jacobs,et al.  Bypassing synthesis: PLS for face recognition with pose, low-resolution and sketch , 2011, CVPR 2011.

[2]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Fei Wang,et al.  Composite hashing with multiple information sources , 2011, SIGIR.

[4]  Yue Gao,et al.  Large-Scale Cross-Modality Search via Collective Matrix Factorization Hashing , 2016, IEEE Transactions on Image Processing.

[5]  S. Shan,et al.  Maximizing intra-individual correlations for face recognition across pose differences , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Sang-Gu Lee,et al.  Simultaneous solutions of Sylvester equations and idempotent matrices separating the joint spectrum , 2011 .

[7]  C. L. Philip Chen,et al.  Robust Nonnegative Patch Alignment for Dimensionality Reduction , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[10]  Dacheng Tao,et al.  Multi-View Intact Space Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[12]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[13]  Yizhou Wang,et al.  Quantized Correlation Hashing for Fast Cross-Modal Search , 2015, IJCAI.

[14]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[15]  Xinge You,et al.  Local Metric Learning for Exemplar-Based Object Detection , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[17]  Lei Zhu,et al.  Unsupervised multi-graph cross-modal hashing for large-scale multimedia retrieval , 2016, Multimedia Tools and Applications.

[18]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Yi Zhen,et al.  Co-Regularized Hashing for Multimodal Data , 2012, NIPS.

[20]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[21]  Seungjin Choi,et al.  Sequential Spectral Learning to Hash with Multiple Representations , 2012, ECCV.

[22]  Laurent Amsaleg,et al.  Locality sensitive hashing: A comparison of hash function types and querying mechanisms , 2010, Pattern Recognit. Lett..

[23]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[25]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[26]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[27]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[28]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[29]  Zhou Yu,et al.  Discriminative coupled dictionary hashing for fast cross-media retrieval , 2014, SIGIR.

[30]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.