Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching
暂无分享,去创建一个
Yongdong Zhang | Chunxiao Liu | Zhendong Mao | Tianzhu Zhang | Bin Wang | An-An Liu | Bin Wang | Anan Liu | Tianzhu Zhang | Yongdong Zhang | Zhendong Mao | Chunxiao Liu
[1] Qingming Huang,et al. Learning Semantic Structure-preserved Embeddings for Cross-modal Retrieval , 2018, ACM Multimedia.
[2] Yan Huang,et al. Learning Semantic Concepts and Order for Image and Sentence Matching , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[3] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, International Journal of Computer Vision.
[4] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[5] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[6] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.
[7] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[8] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[9] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[10] Zhoujun Li,et al. Bi-Directional Spatial-Semantic Attention Networks for Image-Text Matching , 2019, IEEE Transactions on Image Processing.
[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Gang Wang,et al. Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[13] Huchuan Lu,et al. Deep Cross-Modal Projection Learning for Image-Text Matching , 2018, ECCV.
[14] Gang Hua,et al. Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[15] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[16] Yi Yang,et al. Dual-Path Convolutional Image-Text Embedding , 2017, ArXiv.
[17] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[18] Aviv Eisenschtat,et al. Linking Image and Text with 2-Way Nets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Yu Liu,et al. Learning a Recurrent Residual Fusion Network for Multimodal Matching , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[20] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[21] Xiaogang Wang,et al. Identity-Aware Textual-Visual Matching with Latent Co-attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[22] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[23] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[25] Zhe Gan,et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[26] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[27] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[28] Qingming Huang,et al. Online Asymmetric Similarity Learning for Cross-Modal Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Jung-Woo Ha,et al. Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Wei Wang,et al. Instance-Aware Image and Sentence Matching with Selective Multimodal LSTM , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[32] Haoqi Fan,et al. Stacked Latent Attention for Multimodal Reasoning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[33] Zhedong Zheng,et al. Dual-path Convolutional Image-Text Embeddings with Instance Loss , 2017, ACM Trans. Multim. Comput. Commun. Appl..
[34] Tao Mei,et al. Multi-level Attention Networks for Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.
[36] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[37] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Liwei Wang,et al. Learning Two-Branch Neural Networks for Image-Text Matching Tasks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[39] Lin Ma,et al. Multimodal Convolutional Neural Networks for Matching Image and Sentence , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[40] Rodrigo C. Barros,et al. Bidirectional Retrieval Made Simple , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.