Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories
暂无分享,去创建一个
J. Uijlings | V. Ferrari | Fei Sha | Thomas Mensink | Lluís Castrejón | A. Goel | A. Araújo | Howard Zhou | Felipe Cadar | Felipe Cadar
[1] Alan Ritter,et al. Can Pre-trained Vision and Language Models Answer Visual Information-Seeking Questions? , 2023, ArXiv.
[2] David A. Ross,et al. Reveal: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Noah A. Smith,et al. PromptCap: Prompt-Guided Task-Aware Image Captioning , 2022, ArXiv.
[4] Andrew M. Dai,et al. Scaling Instruction-Finetuned Language Models , 2022, ArXiv.
[5] Ashish V. Thapliyal,et al. PaLI: A Jointly-Scaled Multilingual Language-Image Model , 2022, ICLR.
[6] Li Dong,et al. Language Models are General-Purpose Interfaces , 2022, ArXiv.
[7] Dustin Schwenk,et al. A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge , 2022, ECCV.
[8] Lu Yuan,et al. REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering , 2022, NeurIPS.
[9] Radu Soricut,et al. All You May Need for VQA are Image Captions , 2022, NAACL.
[10] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.
[11] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..
[12] C. Buck,et al. Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation , 2022, EMNLP.
[13] Dale Schuurmans,et al. Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.
[14] Dmytro Okhonko,et al. CM3: A Causal Masked Multimodal Model of the Internet , 2022, ArXiv.
[15] Yonatan Bisk,et al. KAT: A Knowledge Augmented Transformer for Vision-and-Language , 2021, NAACL.
[16] Diego de Las Casas,et al. Improving language models by retrieving from trillions of tokens , 2021, ICML.
[17] Po-Sen Huang,et al. Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.
[18] Zhe Gan,et al. An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA , 2021, AAAI.
[19] Serge J. Belongie,et al. Benchmarking Representation Learning for Natural World Image Collections , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Soumen Chakrabarti,et al. Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering , 2021, SIGIR.
[21] Jiecao Chen,et al. WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning , 2021, SIGIR.
[22] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[23] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[24] Marcus Rohrbach,et al. KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[26] Fabio Petroni,et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.
[27] Tobias Weyand,et al. Google Landmarks Dataset v2 – A Large-Scale Benchmark for Instance-Level Recognition and Retrieval , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.
[29] Omer Levy,et al. Generalization through Memorization: Nearest Neighbor Language Models , 2019, ICLR.
[30] Ashish Sabharwal,et al. QASC: A Dataset for Question Answering via Sentence Composition , 2019, AAAI.
[31] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[32] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[33] Iryna Gurevych,et al. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.
[34] Partha Pratim Talukdar,et al. KVQA: Knowledge-Aware Visual Question Answering , 2019, AAAI.
[35] Ali Farhadi,et al. OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.
[37] Peter Clark,et al. Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.
[38] Jonathan Berant,et al. The Web as a Knowledge-Base for Answering Complex Questions , 2018, NAACL.
[39] Sebastian Riedel,et al. Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.
[40] Kyunghyun Cho,et al. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine , 2017, ArXiv.
[41] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2016, International Journal of Computer Vision.
[42] Qi Wu,et al. FVQA: Fact-Based Visual Question Answering , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[43] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[44] Licheng Yu,et al. Visual Madlibs: Fill in the blank Image Generation and Question Answering , 2015, ArXiv.
[45] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[46] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[47] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[48] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[49] Loïc Barrault,et al. In Factuality: Efficient Integration of Relevant Facts for Visual Question Answering , 2021, ACL.
[50] Eleanor Rosch,et al. Principles of Categorization , 1978 .