论文信息 - Learning DALTS for cross-modal retrieval

Learning DALTS for cross-modal retrieval

Cross-modal retrieval has been recently proposed to find an appropriate subspace, where the similarity across different modalities such as image and text can be directly measured. In this study, different from most existing works, the authors propose a novel model for cross-modal retrieval based on a domain-adaptive limited text space (DALTS) rather than a common space or an image space. Experimental results on three widely used datasets, Flickr8K, Flickr30K and Microsoft Common Objects in Context (MSCOCO), show that the proposed method, dubbed DALTS, is able to learn superior text space features which can effectively capture the necessary information for cross-modal retrieval. Meanwhile, DALTS achieves promising improvements in accuracy for cross-modal retrieval compared with the current state-of-the-art methods.

Wenmin Wang | Zheng Yu

[1] Marcel Worring,et al. Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[3] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..