Dual-pathway Attention based Supervised Adversarial Hashing for Cross-modal Retrieval

Due to the success of deep learning in recent years, cross-modal retrieval has made significant progress. However, there is still a key challenge: how to learn the correlation between different modality data more effectively to improve the retrieval accuracy. Therefore, in this paper we proposed a Dual-pathway Attention based Supervise Adversarial Hashing (DASAH) to obtain a unified cross-modal semantic representation. Based on Dual-pathway attention, that is, learning the attention of image regions (text sequences) to text sequences (image regions), the fine-grained semantic correlation between different modality data is deeply mined, and the adversarial learning is integrated to further improve the learning ability of cross-modal semantic correlation. The model makes full use of dual-pathway attention to guide fine-grained cross modal feature learning, and integrates fine-grained cross modal feature learning and adversarial hashing learning in a unified framework for joint learning and optimization. Extensive empirical studies show that the proposed method outperforms several state-of-the-art methods for cross-modal retrieval.