Combo of Thinking and Observing for Outside-Knowledge VQA
暂无分享,去创建一个
Zheng Lin | Yuchen Mo | Weiping Wang | Huishan Ji | Qingyi Si | Q. Si
[1] Chang Xu,et al. Bilinear Graph Networks for Visual Question Answering , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[2] Jie Zhou,et al. Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA , 2022, EMNLP.
[3] Clayton D. Scott,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[4] Ashish V. Thapliyal,et al. PaLI: A Jointly-Scaled Multilingual Language-Image Model , 2022, ICLR.
[5] Lu Yuan,et al. REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering , 2022, NeurIPS.
[6] Aishwarya N. Reganti,et al. Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Yuta Nakashima,et al. Gender and Racial Bias in Visual Question Answering Datasets , 2022, FAccT.
[8] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.
[9] Qi Wu,et al. MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Jingren Zhou,et al. OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework , 2022, ICML.
[11] Yonatan Bisk,et al. KAT: A Knowledge Augmented Transformer for Vision-and-Language , 2021, NAACL.
[12] Zhe Gan,et al. An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA , 2021, AAAI.
[13] Roozbeh Mottaghi,et al. Multi-Modal Answer Validation for Knowledge-Based VQA , 2021, AAAI.
[14] Liang Lin,et al. Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding , 2020, IEEE Transactions on Neural Networks and Learning Systems.
[15] Pratyay Banerjee,et al. Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering , 2021, EMNLP.
[16] Matthieu Cord,et al. Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[17] Fei-Yue Wang,et al. KM4: Visual reasoning via Knowledge Embedding Memory Model with Mutual Modulation , 2021, Inf. Fusion.
[18] Anton van den Hengel,et al. Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge , 2021, LANTERN.
[19] Lei Zhang,et al. VinVL: Making Visual Representations Matter in Vision-Language Models , 2021, ArXiv.
[20] Hua Wu,et al. UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning , 2020, ACL.
[21] Marcus Rohrbach,et al. KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.
[23] François Gardères,et al. ConceptBert: Concept-Aware Representation for Visual Question Answering , 2020, FINDINGS.
[24] Xin Wang,et al. Boosting Visual Question Answering with Context-aware Knowledge Aggregation , 2020, ACM Multimedia.
[25] Weifeng Zhang,et al. Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering , 2020, Pattern Recognit..
[26] Yujing Wang,et al. Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering , 2020, IJCAI.
[27] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[28] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.
[29] Xinlei Chen,et al. In Defense of Grid Features for Visual Question Answering , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[31] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[32] Ahmed El Kholy,et al. UNITER: Learning UNiversal Image-TExt Representations , 2019, ECCV 2020.
[33] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[34] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[35] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Ming-Wei Chang,et al. Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.
[37] Ali Farhadi,et al. OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Xinlei Chen,et al. Towards VQA Models That Can Read , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Christopher D. Manning,et al. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Larry S. Davis,et al. Explicit Bias Discovery in Visual Question Answering Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[42] Byoung-Tak Zhang,et al. Bilinear Attention Networks , 2018, NeurIPS.
[43] Dhruv Batra,et al. Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[44] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.
[45] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.
[46] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Catherine Havasi,et al. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.
[48] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2016, International Journal of Computer Vision.
[49] Chunhua Shen,et al. Explicit Knowledge-based Reasoning for Visual Question Answering , 2015, IJCAI.
[50] Dhruv Batra,et al. Analyzing the Behavior of Visual Question Answering Models , 2016, EMNLP.
[51] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[52] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Petr Baudis,et al. Modeling of the Question Answering Task in the YodaQA System , 2015, CLEF.
[54] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[55] Andrew Chou,et al. Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.
[56] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .