Robust Unsupervised Cross-modal Hashing for Multimedia Retrieval

With the quick development of social websites, there are more opportunities to have different media types (such as text, image, video, etc.) describing the same topic from large-scale heterogeneous data sources. To efficiently identify the inter-media correlations for multimedia retrieval, unsupervised cross-modal hashing (UCMH) has gained increased interest due to the significant reduction in computation and storage. However, most UCMH methods assume that the data from different modalities are well paired. As a result, existing UCMH methods may not achieve satisfactory performance when partially paired data are given only. In this article, we propose a new-type of UCMH method called robust unsupervised cross-modal hashing (RUCMH). The major contribution lies in jointly learning modal-specific hash function, exploring the correlations among modalities with partial or even without any pairwise correspondence, and preserving the information of original features as much as possible. The learning process can be modeled via a joint minimization problem, and the corresponding optimization algorithm is presented. A series of experiments is conducted on four real-world datasets (Wiki, MIRFlickr, NUS-WIDE, and MS-COCO). The results demonstrate that RUCMH can significantly outperform the state-of-the-art unsupervised cross-modal hashing methods, especially for the partially paired case, which validates the effectiveness of RUCMH.

[1]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[2]  Ivor W. Tsang,et al.  Transfer Hashing with Privileged Information , 2016, IJCAI.

[3]  Yue Gao,et al.  TUCH: Turning Cross-view Hashing into Single-view Hashing via Generative Adversarial Nets , 2017, IJCAI.

[4]  Yuxin Peng,et al.  Unsupervised Generative Adversarial Cross-modal Hashing , 2017, AAAI.

[5]  Philip S. Yu,et al.  Deep Visual-Semantic Hashing for Cross-Modal Retrieval , 2016, KDD.

[6]  Changsheng Xu,et al.  Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval , 2015, IEEE Transactions on Multimedia.

[7]  Ruifan Li,et al.  Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[8]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[9]  Philip S. Yu,et al.  Composite Correlation Quantization for Efficient Multimodal Retrieval , 2015, SIGIR.

[10]  David Novak,et al.  Binary Sketches for Secondary Filtering , 2018, ACM Trans. Inf. Syst..

[11]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[12]  Ming Shao,et al.  Generalized Transfer Subspace Learning Through Low-Rank Constraint , 2014, International Journal of Computer Vision.

[13]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[14]  Qi Tian,et al.  Generalized Semi-supervised and Structured Subspace Learning for Cross-Modal Retrieval , 2018, IEEE Transactions on Multimedia.

[15]  Shiming Xiang,et al.  Cross-Modal Hashing via Rank-Order Preserving , 2017, IEEE Transactions on Multimedia.

[16]  Tao Liu,et al.  Discriminative kernel transfer learning via l2,1-norm minimization , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[17]  Gabriela Csurka,et al.  Unsupervised Visual and Textual Information Fusion in CBMIR Using Graph-Based Methods , 2015, TOIS.

[18]  Jian Yu,et al.  Learning Transferred Weights From Co-Occurrence Data for Heterogeneous Transfer Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Kien A. Hua,et al.  Learning Label Preserving Binary Codes for Multimedia Retrieval , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[20]  Stefan M. Rüger,et al.  An information-theoretic framework for semantic-multimedia retrieval , 2010, TOIS.

[21]  Heng Tao Shen,et al.  Semi-Paired Discrete Hashing: Learning Latent Hash Codes for Semi-Paired Cross-View Retrieval , 2017, IEEE Transactions on Cybernetics.

[22]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[24]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Qi Tian,et al.  Cross-Modal Retrieval Using Multiordered Discriminative Structured Subspace Learning , 2017, IEEE Transactions on Multimedia.

[26]  Jianmin Wang,et al.  Collective Deep Quantization for Efficient Cross-Modal Retrieval , 2017, AAAI.

[27]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[28]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[29]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[30]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[31]  Jingdong Wang,et al.  Collaborative Quantization for Cross-Modal Similarity Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Nikhil Rasiwasia,et al.  Cluster Canonical Correlation Analysis , 2014, AISTATS.

[33]  Wei Liu,et al.  Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval , 2016, IEEE Transactions on Multimedia.

[34]  Luo Si,et al.  Learning to Hash on Partial Multi-Modal Data , 2015, IJCAI.

[35]  WangWei,et al.  Effective multi-modal retrieval based on stacked auto-encoders , 2014, VLDB 2014.

[36]  Jungong Han,et al.  Cross-View Retrieval via Probability-Based Semantics-Preserving Hashing , 2017, IEEE Transactions on Cybernetics.

[37]  Yue Gao,et al.  Large-Scale Cross-Modality Search via Collective Matrix Factorization Hashing , 2016, IEEE Transactions on Image Processing.

[38]  E. Süli,et al.  Numerical Solution of Partial Differential Equations , 2014 .

[39]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[40]  Yizhou Wang,et al.  Quantized Correlation Hashing for Fast Cross-Modal Search , 2015, IJCAI.

[41]  Bin Liu,et al.  Cross-Modal Hamming Hashing , 2018, ECCV.

[42]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Shao-Yuan Li,et al.  Partial Multi-View Clustering , 2014, AAAI.

[44]  Zi Huang,et al.  Linear cross-modal hashing for efficient multimedia search , 2013, ACM Multimedia.

[45]  Kien A. Hua,et al.  Linear Subspace Ranking Hashing for Cross-Modal Retrieval , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Zhou Yu,et al.  Cross-Media Hashing with Neural Networks , 2014, ACM Multimedia.

[48]  Xi Chen,et al.  Stacked Cross Attention for Image-Text Matching , 2018, ECCV.