A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering
暂无分享,去创建一个
Aishwarya N. Reganti | Qing Ping | Govind Thattai | Aishwarya Reganti | Feng Gao | Prem Natarajan | Ying Nian Wu | Q. Ping | Premkumar Natarajan | G. Thattai | Feng Gao | Yingting Wu
[1] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[2] Fengmao Lv,et al. TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering , 2020, ECCV.
[3] Dhruv Batra,et al. Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? , 2016, EMNLP.
[4] Zhou Yu,et al. Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[5] Jun Yan,et al. Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering , 2020, EMNLP.
[6] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[7] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[8] Peter Clark,et al. GenericsKB: A Knowledge Base of Generic Statements , 2020, ArXiv.
[9] Partha Pratim Talukdar,et al. KVQA: Knowledge-Aware Visual Question Answering , 2019, AAAI.
[10] Edouard Grave,et al. Distilling Knowledge from Reader to Retriever for Question Answering , 2020, ArXiv.
[11] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[12] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[13] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.
[15] Eneko Agirre,et al. Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering , 2021, ArXiv.
[16] Anton van den Hengel,et al. Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge , 2021, LANTERN.
[17] Dacheng Tao,et al. Bilinear Graph Networks for Visual Question Answering , 2019 .
[18] Xinlei Chen,et al. In Defense of Grid Features for Visual Question Answering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Marcus Rohrbach,et al. KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[21] Hugo Zaragoza,et al. The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..
[22] Catherine Havasi,et al. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.
[23] Pratyay Banerjee,et al. Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering , 2021, EMNLP.
[24] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[25] Zhe Gan,et al. An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA , 2021, ArXiv.
[26] Xinlei Chen,et al. Towards VQA Models That Can Read , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Soumen Chakrabarti,et al. Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering , 2021, SIGIR.
[28] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[29] Chuang Gan,et al. Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.
[30] Marcus Rohrbach,et al. Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering , 2019, ICML.
[31] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[32] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[33] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[34] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[35] Christopher Clark,et al. Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.
[36] Juan Carlos Niebles,et al. Interpretable Visual Question Answering by Visual Grounding From Attention Supervision Mining , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).
[37] Edouard Grave,et al. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , 2020, EACL.
[38] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[39] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Nan Duan,et al. Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering , 2019, AAAI.
[41] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.
[42] Jure Leskovec,et al. QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering , 2021, NAACL.
[43] Chunhua Shen,et al. Explicit Knowledge-based Reasoning for Visual Question Answering , 2015, IJCAI.
[44] Yu Cheng,et al. Large-Scale Adversarial Training for Vision-and-Language Representation Learning , 2020, NeurIPS.
[45] Ruslan Salakhutdinov,et al. Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text , 2018, EMNLP.
[46] Ahmed El Kholy,et al. UNITER: Learning UNiversal Image-TExt Representations , 2019, ECCV 2020.
[47] Jimmy J. Lin,et al. End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.
[48] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.
[49] Xiaoyan Wang,et al. Improving Natural Language Inference Using External Knowledge in the Science Questions Domain , 2018, AAAI.
[50] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[51] Christopher D. Manning,et al. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Ali Farhadi,et al. OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Lei Zhang,et al. VinVL: Making Visual Representations Matter in Vision-Language Models , 2021, ArXiv.
[54] Byoung-Tak Zhang,et al. Bilinear Attention Networks , 2018, NeurIPS.
[55] François Gardères,et al. ConceptBert: Concept-Aware Representation for Visual Question Answering , 2020, FINDINGS.
[56] Qi Wu,et al. FVQA: Fact-Based Visual Question Answering , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[57] Wei Zhang,et al. R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering , 2018, KDD.
[58] Roozbeh Mottaghi,et al. Multi-Modal Answer Validation for Knowledge-Based VQA , 2021, AAAI.
[59] ‘Just because you are right, doesn’t mean I am wrong’: Overcoming a bottleneck in development and evaluation of Open-Ended VQA tasks , 2021, EACL.
[60] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Xiang Ren,et al. KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning , 2019, EMNLP.
[62] Ming-Wei Chang,et al. Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.
[63] Ramesh Nallapati,et al. Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering , 2019, EMNLP.
[64] Pengchuan Zhang,et al. Image Scene Graph Generation (SGG) Benchmark , 2021, ArXiv.
[65] Can Gao,et al. UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning , 2021, ACL/IJCNLP.
[66] Trevor Darrell,et al. Learning to Reason: End-to-End Module Networks for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).