Linguistic issues behind visual question answering
暂无分享,去创建一个
[1] Elia Bruni,et al. The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue , 2019, ACL.
[2] Albert Gatt,et al. Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks , 2020, MMSR.
[3] Qi Wu,et al. FVQA: Fact-Based Visual Question Answering , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[4] David Schlangen,et al. Know What You Don’t Know: Modeling a Pragmatic Speaker that Refers to Objects of Unknown Categories , 2019, ACL.
[5] Sandro Pezzelle,et al. Big Generalizations with Small Data: Exploring the Role of Training Samples in Learning Adjectives of Size , 2019, EMNLP.
[6] Yoshua Bengio,et al. FigureQA: An Annotated Figure Dataset for Visual Reasoning , 2017, ICLR.
[7] Terry Winograd,et al. Understanding natural language , 1974 .
[8] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[9] Christopher Kanan,et al. TallyQA: Answering Complex Counting Questions , 2018, AAAI.
[10] Stefan Lee,et al. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[11] Christopher Potts,et al. Colors in Context: A Pragmatic Neural Model for Grounded Language Understanding , 2017, TACL.
[12] L. Barsalou. Grounded cognition. , 2008, Annual review of psychology.
[13] Walter Daelemans,et al. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2014, EMNLP 2014.
[14] Yoav Artzi,et al. A Corpus of Natural Language for Visual Reasoning , 2017, ACL.
[15] Francis Ferraro,et al. Visual Storytelling , 2016, NAACL.
[16] Jianfeng Gao,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2020, AAAI.
[17] Koji Mineshima,et al. Multimodal Logical Inference System for Visual-Textual Entailment , 2019, ACL.
[18] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Anoop Cherian,et al. End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Leonidas J. Guibas,et al. Shapeglot: Learning Language for Shape Differentiation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[21] Christopher D. Manning,et al. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Henning Müller,et al. Overview of the VQA-Med Task at ImageCLEF 2021: Visual Question Answering and Generation in the Medical Domain , 2020, CLEF.
[23] Raquel Fernández,et al. The Devil is in the Details: A Magnifying Glass for the GuessWhich Visual Dialogue Game , 2019 .
[24] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[25] Tal Linzen,et al. How Can We Accelerate Progress Towards Human-like Linguistic Generalization? , 2020, ACL.
[26] Asim Kadav,et al. Visual Entailment: A Novel Task for Fine-Grained Image Understanding , 2019, ArXiv.
[27] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[28] Dhruv Batra,et al. Analyzing the Behavior of Visual Question Answering Models , 2016, EMNLP.
[29] R. Borsley,et al. Head-Driven Phrase Structure Grammar: The handbook , 2018 .
[30] Yoav Artzi,et al. A Corpus for Reasoning about Natural Language Grounded in Photographs , 2018, ACL.
[31] Christopher Kennedy. Vagueness and grammar: the semantics of relative and absolute gradable adjectives , 2007 .
[32] Sandro Pezzelle,et al. Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers from Vision , 2017, EACL.
[33] Dhruv Batra,et al. Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[34] Zellig S. Harris,et al. Distributional Structure , 1954 .
[35] Jean Maillard,et al. Black Holes and White Rabbits: Metaphor Identification with Visual Features , 2016, NAACL.
[36] Sandro Pezzelle,et al. Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts , 2019, EMNLP.
[37] Raffaella Bernardi,et al. Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat , 2018, NAACL.
[38] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[39] Sandro Pezzelle,et al. FOIL it! Find One mismatch between Image and Language caption , 2017, ACL.
[40] Chuang Gan,et al. The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision , 2019, ICLR.
[41] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Raffaella Bernardi,et al. There Is No Logical Negation Here, But There Are Alternatives: Modeling Conversational Negation with Distributional Semantics , 2016, Computational Linguistics.
[43] Christopher Kanan,et al. An Analysis of Visual Question Answering Algorithms , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[44] Christopher Potts,et al. Pragmatically Informative Image Captioning with Character-Level Inference , 2018, NAACL.
[45] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Yash Goyal,et al. Yin and Yang: Balancing and Answering Binary Visual Questions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Chitta Baral,et al. VQA-LOL: Visual Question Answering under the Lens of Logic , 2020, ECCV.
[48] Julia Hockenmaier,et al. Learning to execute instructions in a Minecraft dialogue , 2020, ACL.
[49] Wei Han,et al. Finding the Evidence: Localization-aware Answer Prediction for Text Visual Question Answering , 2020, COLING.
[50] Afsaneh Fazly,et al. A Probabilistic Computational Model of Cross-Situational Word Learning , 2010, Cogn. Sci..
[51] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[52] Michael S. Bernstein,et al. Visual7W: Grounded Question Answering in Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Jiebo Luo,et al. VizWiz Grand Challenge: Answering Visual Questions from Blind People , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[54] Simeon Schüz,et al. Knowledge Supports Visual Language Grounding: A Case Study on Colour Terms , 2020, ACL.
[55] Lucia Specia,et al. Object Counts! Bringing Explicit Detections Back into Image Captioning , 2018, NAACL.
[56] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[57] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[58] Sandro Pezzelle,et al. “Look, some Green Circles!”: Learning to Quantify from Images , 2016, VL@ACL.
[59] Yoav Artzi,et al. TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Christopher Kanan,et al. Visual question answering: Datasets, algorithms, and future challenges , 2016, Comput. Vis. Image Underst..
[61] John R. Searle,et al. Minds, brains, and programs , 1980, Behavioral and Brain Sciences.
[62] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[63] Trevor Darrell,et al. Learning to Reason: End-to-End Module Networks for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[64] B. Partee. Lexical semantics and compositionality. , 1995 .
[65] Anton van den Hengel,et al. Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[66] Razvan Pascanu,et al. A simple neural network module for relational reasoning , 2017, NIPS.
[67] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[68] Verena Rieser,et al. History for Visual Dialog: Do we really need it? , 2020, ACL.
[69] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[70] Qi Wu,et al. Visual question answering: A survey of methods and datasets , 2016, Comput. Vis. Image Underst..
[71] Willard Van Orman Quine,et al. Word and Object , 1960 .
[72] Brian L. Price,et al. DVQA: Understanding Data Visualizations via Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[73] Michael C. Frank,et al. A pragmatic account of the processing of negative sentences , 2014, CogSci.
[74] Rob Miller,et al. VizWiz: nearly real-time answers to visual questions , 2010, UIST.
[75] T. Landauer,et al. A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .
[76] Donald Geman,et al. Visual Turing test for computer vision systems , 2015, Proceedings of the National Academy of Sciences.
[77] Ajay Divakaran,et al. Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation , 2019, EMNLP.
[78] Vicente Ordonez,et al. Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries , 2019, NeurIPS.
[79] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[80] Christopher Potts,et al. Learning to Generate Compositional Color Descriptions , 2016, EMNLP.
[81] Carina Silberer,et al. Visually Grounded Meaning Representations , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[82] Li Fei-Fei,et al. Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[83] Brenden M. Lake,et al. Mutual exclusivity as a challenge for deep neural networks , 2019, NeurIPS.
[84] Mariella Dimiccoli,et al. Learning quantification from images: A structured neural architecture , 2018, Nat. Lang. Eng..
[85] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[86] R. Bernardi,et al. Quantifiers in a Multimodal World: Hallucinating Vision with Language and Sound , 2019, Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics.
[87] Sandro Pezzelle,et al. Comparatives, Quantifiers, Proportions: A Multi-Task Model for the Learning of Quantities from Vision , 2018, NAACL-HLT.
[88] J. Ginzburg,et al. Wh-Questions are understood before polar-questions: Evidence from English, German, and Chinese , 2020, Journal of Child Language.
[89] Qi Wu,et al. Visual Question Answering: A Tutorial , 2017, IEEE Signal Processing Magazine.
[90] Chunhua Shen,et al. Explicit Knowledge-based Reasoning for Visual Question Answering , 2015, IJCAI.
[91] Yuandong Tian,et al. Simple Baseline for Visual Question Answering , 2015, ArXiv.
[92] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[93] Stefan Lee,et al. Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[94] Dan Klein,et al. Reasoning about Pragmatics with Neural Listeners and Speakers , 2016, EMNLP.
[95] Martial Hebert,et al. Patch to the Future: Unsupervised Visual Prediction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[96] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[97] Louis-Philippe Morency,et al. Using Syntax to Ground Referring Expressions in Natural Images , 2018, AAAI.
[98] Roser Morante,et al. Pragmatic Factors in Image Description: The Case of Negations , 2016, VL@ACL.
[99] Snehasis Mukherjee,et al. Visual Question Answering using Deep Learning: A Survey and Performance Analysis , 2019, ArXiv.
[100] Oliver Lemon,et al. Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games , 2020, COLING.
[101] Christian Wolf,et al. Roses are Red, Violets are Blue… But Should VQA expect Them To? , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[102] Albert Gatt,et al. Grounded Textual Entailment , 2018, COLING.
[103] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[104] Sandro Pezzelle,et al. Be Different to Be Better! A Benchmark to Leverage the Complementarity of Language and Vision , 2020, FINDINGS.
[105] Ali Farhadi,et al. Situation Recognition: Visual Semantic Role Labeling for Image Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[106] Joshua B. Tenenbaum,et al. Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.
[107] Elia Bruni,et al. Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..
[108] Marco Baroni,et al. Grounding Distributional Semantics in the Visual World , 2016, Lang. Linguistics Compass.
[109] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[110] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[111] Hugo Larochelle,et al. GuessWhat?! Visual Object Discovery through Multi-modal Dialogue , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[112] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[113] Xiao Lin,et al. Don't just listen, use your imagination: Leveraging visual common sense for non-visual tasks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[114] Alexander Kuhnle,et al. ShapeWorld - A new test methodology for multimodal language understanding , 2017, ArXiv.
[115] Christopher Kanan,et al. Challenges and Prospects in Vision and Language Research , 2019, Front. Artif. Intell..
[116] Licheng Yu,et al. TVQA+: Spatio-Temporal Grounding for Video Question Answering , 2019, ACL.
[117] J. Firth,et al. Papers in linguistics, 1934-1951 , 1957 .
[118] M. Tomasello,et al. Social cognition, joint attention, and communicative competence from 9 to 15 months of age. , 1998, Monographs of the Society for Research in Child Development.
[119] Chuang Gan,et al. Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.
[120] Eric Horvitz,et al. SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions , 2020, ArXiv.
[121] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[122] Parisa Kordjamshidi,et al. Cross-Modality Relevance for Reasoning on Language and Vision , 2020, ACL.
[123] Jacob Andreas,et al. Experience Grounds Language , 2020, EMNLP.
[124] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[125] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[126] Ramprasaath R. Selvaraju,et al. Counting Everyday Objects in Everyday Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[127] Yash Goyal,et al. Towards Transparent AI Systems: Interpreting Visual Question Answering Models , 2016, 1608.08974.
[128] Gordon Christie,et al. Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes , 2016, EMNLP.
[129] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[130] Sina Zarrieß,et al. Tell Me More: A Dataset of Visual Scene Description Sequences , 2019, INLG.
[131] Luciana Benotti,et al. On the role of effective and referring questions in GuessWhat?! , 2020, ALVR.
[132] Chitta Baral,et al. MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering , 2020, EMNLP.
[133] Licheng Yu,et al. Visual Madlibs: Fill in the Blank Description Generation and Question Answering , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[134] Shimon Ullman,et al. Do You See What I Mean? Visual Resolution of Linguistic Ambiguities , 2015, EMNLP.
[135] Ernest Valveny,et al. Scene Text Visual Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[136] Bonnie L. Webber,et al. Special issue on interactive question answering: Introduction , 2009, Natural Language Engineering.
[137] Binsu C. Kovoor,et al. Visual question answering: a state-of-the-art review , 2020, Artificial Intelligence Review.
[138] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[139] Licheng Yu,et al. TVQA: Localized, Compositional Video Question Answering , 2018, EMNLP.