Multi-Similarity Semantic Correctional Hashing For Cross Modal Retrieval

Given the benefits of their low storage requirements and high retrieval efficiency, hashing methods have attracted considerable attention for large scale cross-modal retrieval and significant progress has been made recently. However, the existing methods generally use the label-guided similarity matrix to measure the similarities of sample pairs, which limits their semantic representation capability. Moreover, the sample imbalance of different classes would bias the learning process toward majority classes and affect the retrieval performance. To boost the semantic representation, to alleviate the impact of data imbalance, and to obtain a high-ranking correlation of hash code pairs, we propose a novel hashing method that uses a semantic correctional similarity matrix to enhance the embedded representation of sample pairs. Furthermore, we propose a novel cross-modal multi-similarity loss based on the general pair weighting framework to collect and weight informative pairs efficiently and accurately, thus improving the retrieval performance. Our analysis and experimental results demonstrate that, compared with recent cross-modal retrieval methods, our methods achieve greater retrieval performance on two datasets MIRFlickr-25K and NUS-WIDE.

[1]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[2]  C. V. Jawahar,et al.  Multi-label Cross-Modal Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[4]  Ji Wan,et al.  Deep Learning for Content-Based Image Retrieval: A Comprehensive Study , 2014, ACM Multimedia.

[5]  Bo Zhang,et al.  Joint Cluster Unary Loss for Efficient Cross-Modal Hashing , 2019, ICMR.

[6]  Shiguang Shan,et al.  Deep Supervised Hashing for Fast Image Retrieval , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Philip S. Yu,et al.  Deep Visual-Semantic Hashing for Cross-Modal Retrieval , 2016, KDD.

[8]  Yuxin Peng,et al.  SCH-GAN: Semi-Supervised Cross-Modal Hashing by Generative Adversarial Network , 2018, IEEE Transactions on Cybernetics.

[9]  Xi Zhang,et al.  Attention-Aware Deep Adversarial Hashing for Cross-Modal Retrieval , 2017, ECCV.

[10]  Xiaoyan Gu,et al.  Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval , 2019, ICMR.

[11]  Philip S. Yu,et al.  HashNet: Deep Learning to Hash by Continuation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Chao Zhang,et al.  Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Matthew R. Scott,et al.  Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Guiguang Ding,et al.  Latent semantic sparse hashing for cross-modal similarity search , 2014, SIGIR.

[15]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[16]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[19]  Wei Liu,et al.  Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[21]  Wei Liu,et al.  Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval , 2017, AAAI.

[22]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[23]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[24]  Wei Liu,et al.  Learning Binary Codes for Maximum Inner Product Search , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Raghavendra Udupa,et al.  Learning Hash Functions for Cross-View Similarity Search , 2011, IJCAI.

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[28]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Zi Huang,et al.  Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.

[31]  Xinbo Gao,et al.  Semantic Topic Multimodal Hashing for Cross-Media Retrieval , 2015, IJCAI.