暂无分享,去创建一个
Yoav Artzi | Alane Suhr | Iris Zhang | Stephanie Zhou | Huajun Bai | Yoav Artzi | Alane Suhr | Huajun Bai | Stephanie Zhou | Iris Zhang | Huajun Bai
[1] Francis Ferraro,et al. A Survey of Current Datasets for Vision and Language Research , 2015, EMNLP.
[2] Trevor Darrell,et al. Explainable Neural Computation via Stack Neural Module Networks , 2018, ECCV.
[3] Asim Kadav,et al. Visual Entailment: A Novel Task for Fine-Grained Image Understanding , 2019, ArXiv.
[4] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Wenhu Chen,et al. Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning , 2016, ArXiv.
[6] Dhruv Batra,et al. C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset , 2017, ArXiv.
[7] Louis-Philippe Morency,et al. Using Syntax to Ground Referring Expressions in Natural Images , 2018, AAAI.
[8] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[9] Licheng Yu,et al. TVQA: Localized, Compositional Video Question Answering , 2018, EMNLP.
[10] Ross A. Knepper,et al. Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction , 2018, CoRL.
[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[13] Ali Farhadi,et al. From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Héctor Allende,et al. Working Memory Networks: Augmenting Memory Networks with a Relational Reasoning Module , 2018, ACL.
[15] Aaron C. Courville,et al. FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.
[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[17] Dhruv Batra,et al. Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[18] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[19] Alexander Kuhnle,et al. ShapeWorld - A new test methodology for multimodal language understanding , 2017, ArXiv.
[20] Jason Weston,et al. Talk the Walk: Navigating New York City through Grounded Dialogue , 2018, ArXiv.
[21] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[22] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[23] Benjamin Kuipers,et al. Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.
[24] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[25] Chen Huang,et al. Learning to Disambiguate by Asking Discriminative Questions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[26] Hao Tan,et al. Object Ordering with Bidirectional Matchings for Visual Reasoning , 2018, NAACL-HLT.
[27] Justin Johnson,et al. DDRprog: A CLEVR Differentiable Dynamic Reasoning Programmer , 2018, ArXiv.
[28] Xiao-Jing Wang,et al. A dataset and architecture for visual reasoning with a working memory , 2018, ECCV.
[29] Luke S. Zettlemoyer,et al. Learning Distributions over Logical Forms for Referring Expression Generation , 2013, EMNLP.
[30] Kees van Deemter,et al. Natural Reference to Objects in a Visual Domain , 2010, INLG.
[31] C. Lawrence Zitnick,et al. Bringing Semantics into Focus Using Visual Abstraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[32] Hexiang Hu,et al. Binary Image Selection (BISON): Interpretable Evaluation of Visual Grounding , 2019, ArXiv.
[33] Christopher D. Manning,et al. Compositional Attention Networks for Machine Reasoning , 2018, ICLR.
[34] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[35] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.
[37] Andrew Bennett,et al. Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction , 2018, EMNLP.
[38] Jonathan Berant,et al. Weakly Supervised Semantic Parsing with Abstract Examples , 2017, ACL.
[39] Yoav Artzi,et al. TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[41] Yoav Artzi,et al. A Corpus of Natural Language for Visual Reasoning , 2017, ACL.
[42] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[43] David Mascharka,et al. Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[44] Bo Xu,et al. Cascaded Mutual Modulation for Visual Reasoning , 2018, EMNLP.
[45] Christopher Kanan,et al. TallyQA: Answering Complex Counting Questions , 2018, AAAI.
[46] Dan Klein,et al. Learning to Compose Neural Networks for Question Answering , 2016, NAACL.
[47] Chris Callison-Burch,et al. Effectively Crowdsourcing Radiology Report Annotations , 2015, Louhi@EMNLP.
[48] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[50] Razvan Pascanu,et al. A simple neural network module for relational reasoning , 2017, NIPS.
[51] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[52] Kewei Tu,et al. Structured Attentions for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[53] Raymond J. Mooney,et al. Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.
[54] Daniel Marcu,et al. Towards a Dataset for Human Computer Communication via Grounded Language Acquisition , 2016, AAAI Workshop: Symbiotic Cognitive Systems.
[55] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[56] Yash Goyal,et al. Yin and Yang: Balancing and Answering Binary Visual Questions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Michael S. Bernstein,et al. Scalable multi-label annotation , 2014, CHI.
[58] Matthew Stone,et al. “Caption” as a Coherence Relation: Evidence and Implications , 2019, Proceedings of the Second Workshop on Shortcomings in Vision and Language.
[59] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[60] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Li Fei-Fei,et al. Inferring and Executing Programs for Visual Reasoning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[62] Chuang Gan,et al. Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.
[63] Yoshua Bengio,et al. FigureQA: An Annotated Figure Dataset for Visual Reasoning , 2017, ICLR.
[64] Christopher Kanan,et al. An Analysis of Visual Question Answering Algorithms , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[65] Trevor Darrell,et al. Learning to Reason: End-to-End Module Networks for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[66] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[67] Luke S. Zettlemoyer,et al. A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.
[68] Carl Doersch,et al. Learning Visual Question Answering by Bootstrapping Hard Attention , 2018, ECCV.
[69] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[70] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[71] Christopher D. Manning,et al. GQA: a new dataset for compositional question answering over real-world images , 2019, ArXiv.