Semi-supervised Cross-Modal Hashing with Graph Convolutional Networks

Cross-modal hashing for large-scale approximate neighbor search has attracted great attention recently because of its significant computational and storage efficiency. However, it is still challenging to generate high-quality binary codes to preserve inter-modal and intra-modal semantics, especially in a semi-supervised manner. In this paper, we propose a semi-supervised cross-modal discrete code learning framework. This is the very first work of applying asymmetric graph convolutional networks (GCNs) for scalable cross-modal retrieval. Specifically, the architecture contains multiple GCN branches, each of which is for one data modality to extract modality-specific features and then to generate unified binary hash codes across different modalities, so that the underlying correlations and similarities across modalities are simultaneously preserved into the hash values. Moreover, the branches are built with asymmetric graph convolutional layers, which employ randomly sampled anchors to tackle the scalability and out-of-sample issue in graph learning, and reduce the complexity of cross-modal similarity calculation. Extensive experiments conducted on benchmark datasets demonstrate that our method can achieve superior retrieval performance in comparison with the state-of-the-art methods.

[1]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[2]  Guiguang Ding,et al.  Collective Matrix Factorization Hashing for Multimodal Data , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yi Zhen,et al.  Co-Regularized Hashing for Multimodal Data , 2012, NIPS.

[5]  Jieping Ye,et al.  A least squares formulation for canonical correlation analysis , 2008, ICML '08.

[6]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[7]  Jungong Han,et al.  Unsupervised Deep Hashing via Binary Latent Factor Models for Large-scale Cross-modal Retrieval , 2018, IJCAI.

[8]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[9]  Nikos Paragios,et al.  Data fusion through cross-modality metric learning using similarity-sensitive hashing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Yuxin Peng,et al.  Unsupervised Generative Adversarial Cross-modal Hashing , 2017, AAAI.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Ling Shao,et al.  Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[13]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[14]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[15]  Wenwu Zhu,et al.  Learning Compact Hash Codes for Multimodal Representations Using Orthogonal Deep Structure , 2015, IEEE Transactions on Multimedia.

[16]  Wei Liu,et al.  Discrete Graph Hashing , 2014, NIPS.

[17]  Zi Huang,et al.  Robust discrete code modeling for supervised hashing , 2018, Pattern Recognit..

[18]  Philip S. Yu,et al.  Composite Correlation Quantization for Efficient Multimodal Retrieval , 2015, SIGIR.

[19]  Jonghyun Choi,et al.  Predictable Dual-View Hashing , 2013, ICML.

[20]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[21]  Yang Yang,et al.  Graph Convolutional Network Hashing , 2020, IEEE Transactions on Cybernetics.

[22]  Xuelong Li,et al.  Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval , 2017, IEEE Transactions on Image Processing.

[23]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[25]  Yao Hu,et al.  Iterative Multi-View Hashing for Cross Media Indexing , 2014, ACM Multimedia.

[26]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[27]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[28]  Yuxin Peng,et al.  Multi-Scale Correlation for Sequential Cross-modal Hashing Learning , 2018, ACM Multimedia.