Why Build an Assistant in Minecraft?

In this document we describe a rationale for a research program aimed at building an open "assistant" in the game Minecraft, in order to make progress on the problems of natural language understanding and learning from dialogue.

[1]  C. Cordell Green,et al.  What Is Program Synthesis? , 1985, J. Autom. Reason..

[2]  Raymond J. Mooney,et al.  Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.

[3]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[4]  Christos Dimitrakakis,et al.  TORCS, The Open Racing Car Simulator , 2005 .

[5]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[6]  Matthias Scheutz,et al.  Learning actions from human-robot dialogues , 2011, 2011 RO-MAN.

[7]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[8]  Sumit Gulwani,et al.  Spreadsheet data manipulation using examples , 2012, CACM.

[9]  Luke S. Zettlemoyer,et al.  Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.

[10]  Antoine Raux,et al.  The Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[11]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[12]  Luke S. Zettlemoyer,et al.  Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[13]  Babak Saleh,et al.  Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[15]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[16]  Sumit Gulwani,et al.  User Interaction Models for Disambiguation in Programming by Example , 2015, UIST.

[17]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[18]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[19]  Rob Fergus,et al.  MazeBase: A Sandbox for Learning from Games , 2015, ArXiv.

[20]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[21]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[22]  Christopher D. Manning,et al.  Learning Language Games through Interaction , 2016, ACL.

[23]  Jason Weston,et al.  Learning Through Dialogue Interactions , 2016, ArXiv.

[24]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[26]  Tomas Mikolov,et al.  A Roadmap Towards Machine Intelligence , 2015, CICLing.

[27]  Jason Weston,et al.  Dialog-based Language Learning , 2016, NIPS.

[28]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[29]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[30]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[31]  Hiroto Udagawa,et al.  Fighting Zombies in Minecraft With Deep Reinforcement Learning , 2016 .

[32]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[33]  Wojciech Jaskowski,et al.  ViZDoom: A Doom-based AI research platform for visual reinforcement learning , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[34]  Shane Legg,et al.  DeepMind Lab , 2016, ArXiv.

[35]  Pushmeet Kohli,et al.  TerpreT: A Probabilistic Programming Language for Program Induction , 2016, ArXiv.

[36]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[37]  Percy Liang,et al.  Data Recombination for Neural Semantic Parsing , 2016, ACL.

[38]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[39]  Percy Liang,et al.  Learning executable semantic parsers for natural language understanding , 2016, Commun. ACM.

[40]  Daniel Marcu,et al.  Natural Language Communication with Robots , 2016, NAACL.

[41]  Florian Richoux,et al.  TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games , 2016, ArXiv.

[42]  Wei Xu,et al.  Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Thomas A. Funkhouser,et al.  MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments , 2017, ArXiv.

[44]  Mark O. Riedl,et al.  Explore, Exploit or Listen: Combining Human Feedback and Policy Model to Speed up Deep Reinforcement Learning in 3D Worlds , 2017, ArXiv.

[45]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[46]  Tom M. Mitchell,et al.  Joint Concept Learning and Semantic Parsing from Natural Language Explanations , 2017, EMNLP.

[47]  Percy Liang,et al.  From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood , 2017, ACL.

[48]  José M. F. Moura,et al.  Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Jason Weston,et al.  Learning through Dialogue Interactions by Asking Questions , 2016, ICLR.

[50]  Hannes Schulz,et al.  Frames: a corpus for adding memory to goal-oriented dialogue systems , 2017, SIGDIAL Conference.

[51]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[52]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[53]  Jason Weston,et al.  Dialogue Learning With Human-In-The-Loop , 2016, ICLR.

[54]  Chen Liang,et al.  Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision , 2016, ACL.

[55]  Yuandong Tian,et al.  ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games , 2017, NIPS.

[56]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[57]  Tim Rocktäschel,et al.  Programming with a Differentiable Forth Interpreter , 2016, ICML.

[58]  Marc Brockschmidt,et al.  Differentiable Programs with Neural Libraries , 2016, ICML.

[59]  Sanja Fidler,et al.  Teaching Machines to Describe Images via Natural Language Feedback , 2017, ArXiv.

[60]  Marc Brockschmidt,et al.  Lifelong Perceptual Programming By Example , 2016, ICLR.

[61]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[62]  Christopher D. Manning,et al.  Naturalizing a Programming Language via Interactive Learning , 2017, ACL.

[63]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[64]  Ming-Wei Chang,et al.  Search-based Neural Structured Learning for Sequential Question Answering , 2017, ACL.

[65]  Leonidas J. Guibas,et al.  ComplementMe , 2017, ACM Trans. Graph..

[66]  Jivko Sinapov,et al.  Guiding Interaction Behaviors for Multi-modal Grounded Language Learning , 2017, RoboNLP@ACL.

[67]  Olivier Pietquin,et al.  End-to-end optimization of goal-driven and visually grounded dialogue systems , 2017, IJCAI.

[68]  Martín Abadi,et al.  Learning a Natural Language Interface with Neural Programmer , 2016, ICLR.

[69]  Dan Klein,et al.  Where is Misty? Interpreting Spatial Descriptors by Modeling Regions in Space , 2017, EMNLP.

[70]  Sumit Gulwani,et al.  Program Synthesis , 2017, Software Systems Safety.

[71]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[72]  Richard Socher,et al.  Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning , 2017, ICLR.

[73]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[74]  Thien Huu Nguyen,et al.  BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop , 2018, ArXiv.

[75]  Daniel Marcu,et al.  Learning Interpretable Spatial Operations in a Rich 3D Blocks World , 2017, AAAI.

[76]  Yuandong Tian,et al.  Building Generalizable Agents with a Realistic and Rich 3D Environment , 2018, ICLR.

[77]  Christopher Joseph Pal,et al.  Towards Deep Conversational Recommendations , 2018, NeurIPS.

[78]  Tom M. Mitchell,et al.  Zero-shot Learning of Classifiers from Natural Language Quantification , 2018, ACL.

[79]  Minlie Huang,et al.  Learning to Ask Questions in Open-domain Conversational Systems with Typed Decoders , 2018, ACL.

[80]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[81]  Hal Daumé,et al.  Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information , 2018, ACL.

[82]  Ming Zhou,et al.  Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base , 2018, NeurIPS.

[83]  Trevor Darrell,et al.  Modular Architecture for StarCraft II with Deep Reinforcement Learning , 2018, AIIDE.

[84]  Spyridon Matsoukas,et al.  The Alexa Meaning Representation Language , 2018, NAACL.

[85]  Matthew J. Hausknecht,et al.  TextWorld: A Learning Environment for Text-based Games , 2018, CGW@IJCAI.

[86]  Stephan Alaniz,et al.  Deep Reinforcement Learning with Model Learning and Monte Carlo Tree Search in Minecraft , 2018, ArXiv.

[87]  Jason Weston,et al.  Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent , 2017, ICLR.

[88]  Christopher Ré,et al.  Training Classifiers with Natural Language Explanations , 2018, ACL.

[89]  Chuang Gan,et al.  Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding , 2018, NeurIPS.

[90]  Wei Xu,et al.  Interactive Grounded Language Acquisition and Generalization in a 2D World , 2018, ICLR.

[91]  Katja Hofmann,et al.  How Players Speak to an Intelligent Game Character Using Natural Language Messages , 2018, Trans. Digit. Games Res. Assoc..

[92]  Jason Weston,et al.  Image-Chat: Engaging Grounded Conversations , 2020, ACL.

[93]  Quoc V. Le,et al.  AirDialogue: An Environment for Goal-Oriented Dialogue Research , 2018, EMNLP.

[94]  Simon Brodeur,et al.  HoME: a Household Multimodal Environment , 2017, ICLR.

[95]  Stefan Lee,et al.  Embodied Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[96]  Zhuoyuan Chen,et al.  CraftAssist: A Framework for Dialogue-enabled Interactive Agents , 2019, ArXiv.

[97]  Thien Huu Nguyen,et al.  BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning , 2018, ICLR.

[98]  Jason Weston,et al.  Learning from Dialogue after Deployment: Feed Yourself, Chatbot! , 2019, ACL.

[99]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[100]  Antoine Bordes,et al.  Image-Chat: Engaging Grounded Conversations , 2018, ACL.