Label consistent matrix factorization based hashing for cross-modal retrieval

Matrix factorization-based hashing has been very effective in addressing the cross-modal retrieval task. In this work, we propose a novel supervised hashing approach utilizing the concepts of matrix factorization which can seamlessly incorporate the label information. In the proposed approach, the latent factors for each individual modality are generated and then converted to the more discriminative label space using modality specific linear transformations. In the first stage of the approach, the hash codes are learnt using an alternating minimization algorithm and in the next stage, modality specific hash functions are learned to convert the original features of the cross-modal data into the hash code domain. In addition, we also propose an extension of the approach for handling very large amounts of data during the training stage. Extensive experiments performed on the single label Wiki, and the multi-labeled MirFlickr and NUS-WIDE datasets show the effectiveness of the proposed approach.

[1]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[3]  Jingdong Wang,et al.  Collaborative Quantization for Cross-Modal Similarity Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  David Suter,et al.  A General Two-Step Approach to Learning-Based Hashing , 2013, 2013 IEEE International Conference on Computer Vision.

[5]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[6]  Yizhou Wang,et al.  Quantized Correlation Hashing for Fast Cross-Modal Search , 2015, IJCAI.

[7]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[8]  Qiang Liu,et al.  Kernel-based supervised hashing for cross-view similarity search , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[9]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[11]  Hanjiang Lai,et al.  Supervised Hashing for Image Retrieval via Image Representation Learning , 2014, AAAI.

[12]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[13]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[14]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[16]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.