暂无分享,去创建一个
[1] Zhuoyuan Chen,et al. CraftAssist: A Framework for Dialogue-enabled Interactive Agents , 2019, ArXiv.
[2] Noah D. Goodman,et al. Shaping Visual Representations with Language for Few-Shot Classification , 2019, ACL.
[3] Dhruv Batra,et al. C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset , 2017, ArXiv.
[4] Luke S. Zettlemoyer,et al. Bootstrapping Semantic Parsers from Conversations , 2011, EMNLP.
[5] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[6] Prasoon Goyal,et al. Using Natural Language for Reward Shaping in Reinforcement Learning , 2019, IJCAI.
[7] Tom M. Mitchell,et al. Joint Concept Learning and Semantic Parsing from Natural Language Explanations , 2017, EMNLP.
[8] Pushmeet Kohli,et al. Learning to Understand Goal Specifications by Modelling Reward , 2018, ICLR.
[9] Peter Stone,et al. Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.
[10] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[11] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[13] Yoav Artzi,et al. Interactive Classification by Asking Informative Questions , 2020, ACL.
[14] Saurabh Gupta,et al. Exploring Nearest Neighbor Approaches for Image Captioning , 2015, ArXiv.
[15] Jason Baldridge,et al. Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding , 2020, EMNLP.
[16] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[17] Chuang Gan,et al. Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.
[18] Chuang Gan,et al. The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision , 2019, ICLR.
[19] Ilya Sutskever,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[20] Wei Xu,et al. Interactive Language Acquisition with One-shot Visual Concept Learning through a Conversational Game , 2018, ACL.
[21] James M. Rehg,et al. Where Are You? Localization from Embodied Dialog , 2020, EMNLP.
[22] Ross A. Knepper,et al. Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight , 2019, CoRL.
[23] Heng Tao Shen,et al. Video Captioning With Attention-Based LSTM and Semantic Consistency , 2017, IEEE Transactions on Multimedia.
[24] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[25] Stefan Lee,et al. Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[26] Dhruv Batra,et al. Analyzing the Behavior of Visual Question Answering Models , 2016, EMNLP.
[27] Yonatan Bisk,et al. Shifting the Baseline: Single Modality Performance on Visual Navigation & QA , 2018, NAACL.
[28] Pierre-Yves Oudeyer,et al. Language as a Cognitive Tool to Imagine Goals in Curiosity-Driven Exploration , 2020, NeurIPS.
[29] Jacob Andreas,et al. Compositional Explanations of Neurons , 2020, NeurIPS.
[30] Bolei Zhou,et al. Understanding the role of individual units in a deep neural network , 2020, Proceedings of the National Academy of Sciences.
[31] Dan Klein,et al. Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.
[32] Xinlei Chen,et al. Order-Aware Generative Modeling Using the 3D-Craft Dataset , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[33] Stefan Lee,et al. Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition , 2018, CoRL.
[34] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Trevor Darrell,et al. Object Hallucination in Image Captioning , 2018, EMNLP.
[36] Thomas L. Griffiths,et al. Learning Rewards from Linguistic Feedback , 2020, AAAI.
[37] Christopher D. Manning,et al. GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Qi Wu,et al. Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[39] Armando Solar-Lezama,et al. Representing Partial Programs with Blended Abstract Semantics , 2020, ArXiv.
[40] Dorsa Sadigh,et al. Learning Adaptive Language Interfaces through Decomposition , 2020, INTEXSEMPAR.
[41] Yoav Artzi,et al. Executing Instructions in Situated Collaborative Interactions , 2019, EMNLP.
[42] Louis-Philippe Morency,et al. Visual Referring Expression Recognition: What Do Systems Actually Learn? , 2018, NAACL.
[43] Stefan Lee,et al. Overcoming Language Priors in Visual Question Answering with Adversarial Regularization , 2018, NeurIPS.
[44] Percy Liang,et al. From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood , 2017, ACL.
[45] Yoav Artzi,et al. A Corpus for Reasoning about Natural Language Grounded in Photographs , 2018, ACL.
[46] Peter Stone,et al. Improving Grounded Natural Language Understanding through Human-Robot Dialog , 2019, 2019 International Conference on Robotics and Automation (ICRA).
[47] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[48] Trevor Darrell,et al. Generating Visual Explanations , 2016, ECCV.
[49] Yoav Artzi,et al. A Corpus of Natural Language for Visual Reasoning , 2017, ACL.
[50] Christopher D. Manning,et al. Naturalizing a Programming Language via Interactive Learning , 2017, ACL.
[51] Aaron C. Courville,et al. FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.
[52] Ruslan Salakhutdinov,et al. Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.
[53] Arthur Szlam,et al. CraftAssist Instruction Parsing: Semantic Parsing for a Voxel-World Assistant , 2020, ACL.
[54] Li Fei-Fei,et al. CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Alexander M. Rush,et al. What is Learned in Visually Grounded Neural Syntax Acquisition , 2020, ACL.
[56] Xinlei Chen,et al. CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication , 2017, ACL.
[57] Wei Xu,et al. Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Nassir Navab,et al. Guide Me: Interacting with Deep Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[59] Andrew Bennett,et al. Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction , 2018, EMNLP.
[60] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.
[61] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[62] Pieter Abbeel,et al. The MineRL BASALT Competition on Learning from Human Feedback , 2021, ArXiv.
[63] Yuxin Peng,et al. Fine-Grained Image Classification via Combining Vision and Language , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Trevor Darrell,et al. Explainable Neural Computation via Stack Neural Module Networks , 2018, ECCV.
[65] Trevor Darrell,et al. Grounding Visual Explanations , 2018, ECCV.
[66] Jacob Andreas,et al. Unnatural Language Processing: Bridging the Gap Between Synthetic and Natural Language Data , 2020, ArXiv.