暂无分享,去创建一个
[1] Anton van den Hengel,et al. Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[2] Byoung-Tak Zhang,et al. Bilinear Attention Networks , 2018, NeurIPS.
[3] Martial Hebert,et al. Learning by Asking Questions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[4] Ricardo Campos,et al. YAKE! Keyword extraction from single documents using multiple local features , 2020, Inf. Sci..
[5] Yike Guo,et al. A visual attention-based keyword extraction for document classification , 2018, Multimedia Tools and Applications.
[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[7] Jason Weston,et al. Memory Networks , 2014, ICLR.
[8] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[9] Martin Jaggi,et al. Simple Unsupervised Keyphrase Extraction using Sentence Embeddings , 2018, CoNLL.
[10] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[11] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Nick Cramer,et al. Automatic Keyword Extraction from Individual Documents , 2010 .
[13] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Vahid Kazemi,et al. Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering , 2017, ArXiv.
[15] Kate Saenko,et al. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.
[16] Juan Enrique Ramos,et al. Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .
[17] Christopher D. Manning,et al. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Zhou Yu,et al. Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[19] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[20] Tatsuya Harada,et al. The Color of the Cat is Gray: 1 Million Full-Sentences Visual Question Answering (FSVQA) , 2016, ArXiv.
[21] Tatsuya Harada,et al. Visual Question Generation for Class Acquisition of Unknown Objects , 2018, ECCV.
[22] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.
[23] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[24] Xiaojun Wan,et al. Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.
[25] Richard Socher,et al. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Diyi Yang,et al. Hierarchical Attention Networks for Document Classification , 2016, NAACL.
[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[28] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.
[29] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[31] Sanja Fidler,et al. Learning to Caption Images Through a Lifetime by Asking Questions , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[32] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[33] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[34] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[36] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[37] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[38] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.