论文信息 - Hierarchical semantic interaction-based deep hashing network for cross-modal retrieval

Hierarchical semantic interaction-based deep hashing network for cross-modal retrieval

Due to the high efficiency of hashing technology and the high abstraction of deep networks, deep hashing has achieved appealing effectiveness and efficiency for large-scale cross-modal retrieval. However, how to efficiently measure the similarity of fine-grained multi-labels for multi-modal data and thoroughly explore the intermediate layers specific information of networks are still two challenges for high-performance cross-modal hashing retrieval. Thus, in this paper, we propose a novel Hierarchical Semantic Interaction-based Deep Hashing Network (HSIDHN) for large-scale cross-modal retrieval. In the proposed HSIDHN, the multi-scale and fusion operations are first applied to each layer of the network. A Bidirectional Bi-linear Interaction (BBI) policy is then designed to achieve the hierarchical semantic interaction among different layers, such that the capability of hash representations can be enhanced. Moreover, a dual-similarity measurement (“hard” similarity and “soft” similarity) is designed to calculate the semantic similarity of different modality data, aiming to better preserve the semantic correlation of multi-labels. Extensive experiment results on two large-scale public datasets have shown that the performance of our HSIDHN is competitive to state-of-the-art deep cross-modal hashing methods.

[1] Trevor Darrell,et al. Learning cross-modality similarity for multinomial data , 2011, 2011 International Conference on Computer Vision.

[2] Michael S. Lew,et al. Deep learning for visual understanding: A review , 2016, Neurocomputing.

[3] Jian Wang,et al. Image-Text Cross-Modal Retrieval via Modality-Specific Feature Learning , 2015, ICMR.

[4] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[5] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[6] Erwin M. Bakker,et al. Deep binary codes for large scale image retrieval , 2017, Neurocomputing.

[7] Erwin M. Bakker,et al. Multi-label semantics preserving based deep cross-modal hashing , 2021, Signal Process. Image Commun..

[8] Jianmin Wang,et al. Correlation Hashing Network for Efficient Cross-Modal Retrieval , 2016, BMVC.

[9] Steffen Rendle,et al. Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[10] Chinmay Chakraborty,et al. Grape Disease Detection Network Based on Multi-Task Learning and Attention Features , 2021, IEEE Sensors Journal.

[11] Xinge You,et al. Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition , 2018, ECCV.

[12] Erwin M. Bakker,et al. Self-constraining and attention-based hashing network for bit-scalable cross-modal retrieval , 2020, Neurocomputing.

[13] Praveen Kumar Reddy Maddikunta,et al. Deep neural networks to predict diabetic retinopathy , 2020, Journal of Ambient Intelligence and Humanized Computing.

[14] Yueting Zhuang,et al. Multi-modal Mutual Topic Reinforce Modeling for Cross-media Retrieval , 2014, ACM Multimedia.

[15] Wei Liu,et al. Pairwise Relationship Guided Deep Hashing for Cross-Modal Retrieval , 2017, AAAI.

[16] Mark J. Huiskes,et al. The MIR flickr retrieval evaluation , 2008, MIR '08.

[17] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Qin Zheng,et al. IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture , 2020, Comput. Networks.

[20] Wu-Jun Li,et al. Deep Cross-Modal Hashing , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Yifan Zhang,et al. A Distance-Driven Alliance for a P2P Live Video System , 2020, IEEE Transactions on Multimedia.

[22] Wei Liu,et al. Discrete Graph Hashing , 2014, NIPS.

[23] Hans-Peter Kriegel,et al. Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[24] Christopher Zach,et al. SPP-Net: Deep Absolute Pose Regression with Synthetic Views , 2017, ArXiv.

[25] Wei Liu,et al. Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26] Mohammed Bennamoun,et al. Image-Based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Bin Liu,et al. Cross-Modal Hamming Hashing , 2018, ECCV.

[28] Praveen Kumar Reddy Maddikunta,et al. A Novel PCA-Firefly Based XGBoost Classification Model for Intrusion Detection in Networks Using GPU , 2020, Electronics.

[29] Jianmin Wang,et al. Semantics-preserving hashing for cross-view retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Dongqing Zhang,et al. Large-Scale Supervised Multimodal Hashing with Semantic Correlation Maximization , 2014, AAAI.

[31] Parminder Kaur,et al. Comparative analysis on cross-modal information retrieval: A review , 2021, Comput. Sci. Rev..

[32] Jian Pei,et al. Parallel field alignment for cross media retrieval , 2013, ACM Multimedia.

[33] Praveen Kumar Reddy Maddikunta,et al. An effective feature engineering for DNN using hybrid PCA-GWO for intrusion detection in IoMT architecture , 2020, Comput. Commun..

[34] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.

[35] Chinmay Chakraborty,et al. Supervised Machine Learning Models for Prediction of COVID-19 Infection using Epidemiology Dataset , 2021, SN Comput. Sci..

[36] Xinbo Gao,et al. Triplet-Based Deep Hashing Network for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.

[37] Xianmin Wang,et al. An enhanced approach for three factor remote user authentication in multi - server environment , 2018, J. Intell. Fuzzy Syst..

[38] Michael Isard,et al. A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.

[39] Zhu Han,et al. Multimedia communication over cognitive radio networks from QoS/QoE perspective: A comprehensive survey , 2020, J. Netw. Comput. Appl..

[40] Subhendu Kumar Pani,et al. Artificial Neural Synchronization Using Nature Inspired Whale Optimization , 2021, IEEE Access.

[41] Lei Wu,et al. Balanced Deep Supervised Hashing , 2019 .

[42] Erwin M. Bakker,et al. A comprehensive evaluation of local detectors and descriptors , 2017, Signal Process. Image Commun..

[43] Tat-Seng Chua,et al. NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[44] Shahabuddin Muhammad,et al. Analysis of In-vehicle Security System of Smart Vehicles , 2019, FNSS.

[45] Doug Young Suh,et al. CASH: Content- and Network-Context-Aware Streaming Over 5G HetNets , 2018, IEEE Access.