Hierarchical semantic interaction-based deep hashing network for cross-modal retrieval

Due to the high efficiency of hashing technology and the high abstraction of deep networks, deep hashing has achieved appealing effectiveness and efficiency for large-scale cross-modal retrieval. However, how to efficiently measure the similarity of fine-grained multi-labels for multi-modal data and thoroughly explore the intermediate layers specific information of networks are still two challenges for high-performance cross-modal hashing retrieval. Thus, in this paper, we propose a novel Hierarchical Semantic Interaction-based Deep Hashing Network (HSIDHN) for large-scale cross-modal retrieval. In the proposed HSIDHN, the multi-scale and fusion operations are first applied to each layer of the network. A Bidirectional Bi-linear Interaction (BBI) policy is then designed to achieve the hierarchical semantic interaction among different layers, such that the capability of hash representations can be enhanced. Moreover, a dual-similarity measurement (“hard” similarity and “soft” similarity) is designed to calculate the semantic similarity of different modality data, aiming to better preserve the semantic correlation of multi-labels. Extensive experiment results on two large-scale public datasets have shown that the performance of our HSIDHN is competitive to state-of-the-art deep cross-modal hashing methods.

[1]  Trevor Darrell,et al.  Learning cross-modality similarity for multinomial data , 2011, 2011 International Conference on Computer Vision.

[2]  Michael S. Lew,et al.  Deep learning for visual understanding: A review , 2016, Neurocomputing.

[3]  Jian Wang,et al.  Image-Text Cross-Modal Retrieval via Modality-Specific Feature Learning , 2015, ICMR.

[4]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[5]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[6]  Erwin M. Bakker,et al.  Deep binary codes for large scale image retrieval , 2017, Neurocomputing.

[7]  Erwin M. Bakker,et al.  Multi-label semantics preserving based deep cross-modal hashing , 2021, Signal Process. Image Commun..

[8]  Jianmin Wang,et al.  Correlation Hashing Network for Efficient Cross-Modal Retrieval , 2016, BMVC.

[9]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[10]  Chinmay Chakraborty,et al.  Grape Disease Detection Network Based on Multi-Task Learning and Attention Features , 2021, IEEE Sensors Journal.

[11]  Xinge You,et al.  Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition , 2018, ECCV.

[12]  Erwin M. Bakker,et al.  Self-constraining and attention-based hashing network for bit-scalable cross-modal retrieval , 2020, Neurocomputing.

[13]  Praveen Kumar Reddy Maddikunta,et al.  Deep neural networks to predict diabetic retinopathy , 2020, Journal of Ambient Intelligence and Humanized Computing.

[14]  Yueting Zhuang,et al.  Multi-modal Mutual Topic Reinforce Modeling for Cross-media Retrieval , 2014, ACM Multimedia.

[15]  Wei Liu,et al.  Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval , 2017, AAAI.

[16]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[17]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Qin Zheng,et al.  IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture , 2020, Comput. Networks.

[20]  Wu-Jun Li,et al.  Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yifan Zhang,et al.  A Distance-Driven Alliance for a P2P Live Video System , 2020, IEEE Transactions on Multimedia.

[22]  Wei Liu,et al.  Discrete Graph Hashing , 2014, NIPS.

[23]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[24]  Christopher Zach,et al.  SPP-Net: Deep Absolute Pose Regression with Synthetic Views , 2017, ArXiv.

[25]  Wei Liu,et al.  Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Mohammed Bennamoun,et al.  Image-Based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Bin Liu,et al.  Cross-Modal Hamming Hashing , 2018, ECCV.

[28]  Praveen Kumar Reddy Maddikunta,et al.  A Novel PCA-Firefly Based XGBoost Classification Model for Intrusion Detection in Networks Using GPU , 2020, Electronics.

[29]  Jianmin Wang,et al.  Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Dongqing Zhang,et al.  Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[31]  Parminder Kaur,et al.  Comparative analysis on cross-modal information retrieval: A review , 2021, Comput. Sci. Rev..

[32]  Jian Pei,et al.  Parallel field alignment for cross media retrieval , 2013, ACM Multimedia.

[33]  Praveen Kumar Reddy Maddikunta,et al.  An effective feature engineering for DNN using hybrid PCA-GWO for intrusion detection in IoMT architecture , 2020, Comput. Commun..

[34]  Armand Joulin,et al.  Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.

[35]  Chinmay Chakraborty,et al.  Supervised Machine Learning Models for Prediction of COVID-19 Infection using Epidemiology Dataset , 2021, SN Comput. Sci..

[36]  Xinbo Gao,et al.  Triplet-Based Deep Hashing Network for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[37]  Xianmin Wang,et al.  An enhanced approach for three factor remote user authentication in multi - server environment , 2018, J. Intell. Fuzzy Syst..

[38]  Michael Isard,et al.  A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[39]  Zhu Han,et al.  Multimedia communication over cognitive radio networks from QoS/QoE perspective: A comprehensive survey , 2020, J. Netw. Comput. Appl..

[40]  Subhendu Kumar Pani,et al.  Artificial Neural Synchronization Using Nature Inspired Whale Optimization , 2021, IEEE Access.

[41]  Lei Wu,et al.  Balanced Deep Supervised Hashing , 2019 .

[42]  Erwin M. Bakker,et al.  A comprehensive evaluation of local detectors and descriptors , 2017, Signal Process. Image Commun..

[43]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[44]  Shahabuddin Muhammad,et al.  Analysis of In-vehicle Security System of Smart Vehicles , 2019, FNSS.

[45]  Doug Young Suh,et al.  CASH: Content- and Network-Context-Aware Streaming Over 5G HetNets , 2018, IEEE Access.