论文信息 - Image Retrieval with Data Augmentation of Sentence Labels Based on Paraphrasing

Image Retrieval with Data Augmentation of Sentence Labels Based on Paraphrasing

Text-based image retrieval is a fundamental study in the field of information retrieval. Recent text-based image retrieval methods employ deep neural networks (here-inafter referred to as deep neural TBIR) to retrieve a desired image from a sentence query and achieve the state-of-the-art performance in TBIR. To improve the retrieval performance of the deep neural TBIR method further, it is essential to prepare diverse sentence labels in training data. However, it takes a lot of effort to prepare diverse sentence labels in training data. To address this problem, we propose a novel deep neural TBIR method with data augmentation of the sentence labels in training data. Experimental results show the effectiveness of the proposed method.

Miki Haseyama | Takahiro Ogawa | Ren Togo | Rintaro Yanagi

[1] Wei Wang,et al. A Comprehensive Survey on Cross-modal Retrieval , 2016, ArXiv.

[2] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[3] Sanja Fidler,et al. Order-Embeddings of Images and Language , 2015, ICLR.

[4] Kevin Gimpel,et al. Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext , 2017, EMNLP.

[5] Taghi M. Khoshgoftaar,et al. A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[6] Yale Song,et al. Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.