Deep Semantic-Preserving Reconstruction Hashing for Unsupervised Cross-Modal Retrieval

Deep hashing is the mainstream algorithm for large-scale cross-modal retrieval due to its high retrieval speed and low storage capacity, but the problem of reconstruction of modal semantic information is still very challenging. In order to further solve the problem of unsupervised cross-modal retrieval semantic reconstruction, we propose a novel deep semantic-preserving reconstruction hashing (DSPRH). The algorithm combines spatial and channel semantic information, and mines modal semantic information based on adaptive self-encoding and joint semantic reconstruction loss. The main contributions are as follows: (1) We introduce a new spatial pooling network module based on tensor regular-polymorphic decomposition theory to generate rank-1 tensor to capture high-order context semantics, which can assist the backbone network to capture important contextual modal semantic information. (2) Based on optimization perspective, we use global covariance pooling to capture channel semantic information and accelerate network convergence. In feature reconstruction layer, we use two bottlenecks auto-encoding to achieve visual-text modal interaction. (3) In metric learning, we design a new loss function to optimize model parameters, which can preserve the correlation between image modalities and text modalities. The DSPRH algorithm is tested on MIRFlickr-25K and NUS-WIDE. The experimental results show that DSPRH has achieved better performance on retrieval tasks.

[1]  Yuxin Peng,et al.  Multi-Pathway Generative Adversarial Hashing for Unsupervised Cross-Modal Retrieval , 2020, IEEE Transactions on Multimedia.

[2]  Jinhui Tang,et al.  Deep Semantic-Preserving Ordinal Hashing for Cross-Modal Similarity Search , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Xin Liu,et al.  MTFH: A Matrix Tri-Factorization Hashing Framework for Efficient Cross-Modal Retrieval , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Zhihai He,et al.  Mask Cross-modal Hashing Networks , 2020 .

[5]  Xuelong Li,et al.  Deep Binary Reconstruction for Cross-Modal Hashing , 2019 .

[6]  Lei Zhang,et al.  Deep CNNs Meet Global Covariance Pooling: Better Representation and Generalization , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jingkuan Song,et al.  BATCH: A Scalable Asymmetric Discrete Cross-Modal Hashing , 2021, IEEE Transactions on Knowledge and Data Engineering.

[8]  Yuxin Peng,et al.  SCH-GAN: Semi-Supervised Cross-Modal Hashing by Generative Adversarial Network , 2018, IEEE Transactions on Cybernetics.

[9]  Yi He,et al.  Adversarial Tri-Fusion Hashing Network for Imbalanced Cross-Modal Retrieval , 2021, IEEE Transactions on Emerging Topics in Computational Intelligence.

[10]  Heng Tao Shen,et al.  Collective Reconstructive Embeddings for Cross-Modal Hashing , 2019, IEEE Transactions on Image Processing.

[11]  Changsheng Xu,et al.  Multi-Level Correlation Adversarial Hashing for Cross-Modal Retrieval , 2020, IEEE Transactions on Multimedia.

[12]  Yilong Yin,et al.  Deep Multiscale Fusion Hashing for Cross-Modal Retrieval , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Jingjing Li,et al.  Fast Discrete Collaborative Multi-Modal Hashing for Large-Scale Multimedia Retrieval , 2020, IEEE Transactions on Knowledge and Data Engineering.

[14]  Xianglong Liu,et al.  Multi-Task Consistency-Preserving Adversarial Hashing for Cross-Modal Retrieval , 2020, IEEE Transactions on Image Processing.

[15]  Xin Luo,et al.  SCRATCH: A Scalable Discrete Matrix Factorization Hashing Framework for Cross-Modal Retrieval , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Jungong Han,et al.  Attribute-Guided Network for Cross-Modal Zero-Shot Hashing , 2018, IEEE Transactions on Neural Networks and Learning Systems.