Executing Instructions in Situated Collaborative Interactions

We study a collaborative scenario where a user not only instructs a system to complete tasks, but also acts alongside it. This allows the user to adapt to the system abilities by changing their language or deciding to simply accomplish some tasks themselves, and requires the system to effectively recover from errors as the user strategically assigns it new goals. We build a game environment to study this scenario, and learn to map user instructions to system actions. We introduce a learning approach focused on recovery from cascading errors between instructions, and modeling methods to explicitly reason about instructions with multiple goals. We evaluate with a new evaluation protocol using recorded interactions and online games with human users, and observe how users adapt to the system abilities.

[1]  Karen E. Lochbaum,et al.  A Collaborative Planning Model of Intentional Structure , 1998, CL.

[2]  Derek Chen,et al.  Decoupling Strategy and Generation in Negotiation Dialogues , 2018, EMNLP.

[3]  Andrea Vedaldi,et al.  Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  N. Arnett Goal-driven Answers in the Cards Dialogue Corpus , 2012 .

[5]  Ross A. Knepper,et al.  Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction , 2018, CoRL.

[6]  David Schlangen,et al.  MeetUp! A Corpus of Joint Activity Dialogues in a Visual Environment , 2019, ArXiv.

[7]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[8]  Ming-Wei Chang,et al.  Search-based Neural Structured Learning for Sequential Question Answering , 2017, ACL.

[9]  Yoav Artzi,et al.  TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Matthew R. Walter,et al.  Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[11]  Jason Weston,et al.  Talk the Walk: Navigating New York City through Grounded Dialogue , 2018, ArXiv.

[12]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[13]  Anca D. Dragan,et al.  Translating Neuralese , 2017, ACL.

[14]  Luke S. Zettlemoyer,et al.  Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[15]  Richard M. Schwartz,et al.  A Fully Statistical Approach to Natural Language Interfaces , 1996, ACL.

[16]  Percy Liang,et al.  Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings , 2017, ACL.

[17]  Xinlei Chen,et al.  CoDraw: Collaborative Drawing as a Testbed for Grounded Goal-driven Communication , 2017, ACL.

[18]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[19]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[20]  Sven Lauer,et al.  Modeling Expert Effects and Common Ground Using Questions Under Discussion , 2011, AAAI Fall Symposium: Building Representations of Common Ground with Intelligent Agents.

[21]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[22]  Ross A. Knepper,et al.  Following High-level Navigation Instructions on a Simulated Quadcopter with Imitation Learning , 2018, Robotics: Science and Systems.

[23]  Percy Liang,et al.  Simpler Context-Dependent Logical Forms via Model Projections , 2016, ACL.

[24]  Yann Dauphin,et al.  Deal or No Deal? End-to-End Learning of Negotiation Dialogues , 2017, EMNLP.

[25]  Antoine Raux,et al.  The Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[26]  Anne H. Anderson,et al.  The Hcrc Map Task Corpus , 1991 .

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Andrew Bennett,et al.  Mapping Instructions to Actions in 3D Environments with Visual Goal Prediction , 2018, EMNLP.

[29]  Christopher Potts,et al.  Goal-Driven Answers in the CardsDialogue Corpus , 2012 .

[30]  Akiko Aizawa,et al.  A Natural Language Corpus of Common Grounding under Continuous and Partially-Observable Context , 2019, AAAI.

[31]  Kyunghyun Cho,et al.  Emergent Communication in a Multi-Modal, Multi-Step Referential Game , 2017, ICLR.

[32]  Luke S. Zettlemoyer,et al.  Bootstrapping Semantic Parsers from Conversations , 2011, EMNLP.

[33]  Yuandong Tian,et al.  Hierarchical Decision Making by Generating and Following Natural Language Instructions , 2019, NeurIPS.

[34]  Sven Lauer,et al.  Corpus Evidence for Preference-Driven Interpretation , 2011, Amsterdam Colloquium on Logic, Language and Meaning.

[35]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[36]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Yoav Artzi,et al.  Learning Compact Lexicons for CCG Semantic Parsing , 2014, EMNLP.

[38]  Stanislao Lauria,et al.  Exploring Miscommunication and Collaborative Behaviour in Human-Robot Interaction , 2009, SIGDIAL Conference.

[39]  Christopher D. Manning,et al.  Learning Language Games through Interaction , 2016, ACL.

[40]  Ross A. Knepper,et al.  Implicit Communication of Actionable Information in Human-AI teams , 2019, CHI.

[41]  Yoav Artzi,et al.  Situated Mapping of Sequential Instructions to Actions with Single-step Reward Observation , 2018, ACL.

[42]  Dan Klein,et al.  Unified Pragmatic Models for Generating and Following Instructions , 2017, NAACL.

[43]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[44]  Stephen Clark,et al.  A New Corpus and Imitation Learning Framework for Context-Dependent Semantic Parsing , 2014, TACL.

[45]  Brenna Argall,et al.  Real-time natural language corrections for assistive robotic manipulators , 2017, Int. J. Robotics Res..

[46]  Hadas Kress-Gazit,et al.  Contextual awareness: Understanding monologic natural language instructions for autonomous robots , 2017, 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[47]  Felix Duvallet,et al.  Imitation learning for natural language direction following through unknown environments , 2013, 2013 IEEE International Conference on Robotics and Automation.

[48]  Yoav Artzi,et al.  Learning to Map Context-Dependent Sentences to Executable Formal Queries , 2018, NAACL.

[49]  Peter Stone,et al.  Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.

[50]  Dan Klein,et al.  Reasoning about Pragmatics with Neural Listeners and Speakers , 2016, EMNLP.

[51]  Candace L. Sidner,et al.  Lessons Learned in Building Spoken Language Collaborative Interface Agents , 2000 .

[52]  John Langford,et al.  Mapping Instructions and Visual Observations to Actions with Reinforcement Learning , 2017, EMNLP.