Discriminative deep asymmetric supervised hashing for cross-modal retrieval

Abstract Due to the advantages of low storage cost and high retrieval efficiency, cross-modal hashing has received considerate attention. Most existing deep cross-modal hashing adopt a symmetric strategy to learn same deep hash functions for both query instances and database instances. However, the training of these symmetric deep cross-modal hashing methods is time-consuming, which makes them hard to effectively utilize the supervised information for cases with large-scale datasets. Inspired by the latest advance in the asymmetric hashing scheme, in this paper, we propose a discriminative deep asymmetric supervised hashing (DDASH) for cross-modal retrieval. Specifically, asymmetric hashing only learns hash codes of query instances by deep hash functions while learning the hash codes of the database instances by hand-crafted matrices. It cannot only make full use of the information in large-scale datasets, but also reduce the training time. Besides, we introduce discrete optimization to reduce the binary quantization error. Furthermore, a mapping matrix which maps generated hash codes into the corresponding labels is introduced to ensure that the hash codes are discriminative. We also calculate the level of similarity between instances as supervised information. Experiments on three common datasets for cross-modal retrieval show that DDASH outperforms state-of-the-art cross-modal hashing methods.

[1]  Wu-Jun Li,et al.  Discrete Latent Factor Model for Cross-Modal Hashing , 2017, IEEE Transactions on Image Processing.

[2]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[3]  Jinhui Tang,et al.  Deep Semantic-Preserving Ordinal Hashing for Cross-Modal Similarity Search , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Wei-Yun Yau,et al.  Structured AutoEncoders for Subspace Clustering , 2018, IEEE Transactions on Image Processing.

[5]  Philip S. Yu,et al.  Deep Visual-Semantic Hashing for Cross-Modal Retrieval , 2016, KDD.

[6]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[7]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[9]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[10]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[11]  Kien A. Hua,et al.  Linear Subspace Ranking Hashing for Cross-Modal Retrieval , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  David Zhang,et al.  Dual Asymmetric Deep Hashing Learning , 2018, IEEE Access.

[13]  Huaxiang Zhang,et al.  Deep Collaborative Multi-View Hashing for Large-Scale Image Search , 2020, IEEE Transactions on Image Processing.

[14]  Dezhong Peng,et al.  Multimodal adversarial network for cross-modal retrieval , 2019, Knowl. Based Syst..

[15]  Qingming Huang,et al.  Multi-modal semantic autoencoder for cross-modal retrieval , 2019, Neurocomputing.

[16]  Wei Liu,et al.  Discrete Graph Hashing , 2014, NIPS.

[17]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[20]  Alexandr Andoni,et al.  Optimal Data-Dependent Hashing for Approximate Near Neighbors , 2015, STOC.

[21]  Rick Siow Mong Goh,et al.  Transfer Hashing: From Shallow to Deep , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Xinbo Gao,et al.  Triplet-Based Deep Hashing Network for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[23]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[24]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[25]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[26]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.