Situated Mapping of Sequential Instructions to Actions with Single-step Reward Observation

We propose a learning approach for mapping context-dependent sequential instructions to actions. We address the problem of discourse and state dependencies with an attention-based model that considers both the history of the interaction and the state of the world. To train from start and goal states without access to demonstrations, we propose SESTRA, a learning algorithm that takes advantage of single-step reward observations and immediate expected reward maximization. We evaluate on the SCONE domains, and show absolute accuracy improvements of 9.8%-25.3% across the domains over approaches that use high-level logical representations.

[1]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[2]  Richard M. Schwartz,et al.  A Fully Statistical Approach to Natural Language Interfaces , 1996, ACL.

[3]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[4]  David L. Chen Fast Online Lexicon Learning for Grounded Language Acquisition , 2012, ACL.

[5]  Percy Liang,et al.  Simpler Context-Dependent Logical Forms via Model Projections , 2016, ACL.

[6]  Ming-Wei Chang,et al.  Driving Semantic Parsing from the World’s Response , 2010, CoNLL.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Alexander I. Rudnicky,et al.  Expanding the Scope of the ATIS Task: The ATIS-3 Corpus , 1994, HLT.

[9]  Hao Tan,et al.  Source-Target Inference Models for Spatial Instruction Understanding , 2017, AAAI.

[10]  Daniel Marcu,et al.  Natural Language Communication with Robots , 2016, NAACL.

[11]  Gökhan Tür,et al.  What is left to be understood in ATIS? , 2010, 2010 IEEE Spoken Language Technology Workshop.

[12]  Luke S. Zettlemoyer,et al.  Learning Context-Dependent Mappings from Sentences to Logical Form , 2009, ACL.

[13]  Percy Liang,et al.  From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood , 2017, ACL.

[14]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[15]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  John Langford,et al.  Learning to Search Better than Your Teacher , 2015, ICML.

[18]  Chen Liang,et al.  Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision , 2016, ACL.

[19]  Yoav Artzi,et al.  Learning to Map Context-Dependent Sentences to Executable Formal Queries , 2018, NAACL.

[20]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[21]  M. Littman,et al.  Mean Actor Critic , 2017, ArXiv.

[22]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[23]  John Langford,et al.  Mapping Instructions and Visual Observations to Actions with Reinforcement Learning , 2017, EMNLP.

[24]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[25]  Peter Stone,et al.  Reinforcement learning , 2019, Scholarpedia.

[26]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[27]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Matthew R. Walter,et al.  Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[29]  Eunsol Choi,et al.  Scaling Semantic Parsers with On-the-Fly Ontology Matching , 2013, EMNLP.

[30]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[31]  Dan Klein,et al.  Unified Pragmatic Models for Generating and Following Instructions , 2017, NAACL.

[32]  Yoav Artzi,et al.  Learning Compact Lexicons for CCG Semantic Parsing , 2014, EMNLP.

[33]  Jonathan Berant,et al.  Semantic Parsing via Paraphrasing , 2014, ACL.

[34]  Dan Klein,et al.  Alignment-Based Compositional Semantics for Instruction Following , 2015, EMNLP.

[35]  Daniel Marcu,et al.  Learning Interpretable Spatial Operations in a Rich 3D Blocks World , 2017, AAAI.

[36]  Luke S. Zettlemoyer,et al.  Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[37]  Jonathan Berant,et al.  Imitation Learning of Agenda-based Semantic Parsers , 2015, TACL.

[38]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[39]  Regina Barzilay,et al.  Representation Learning for Grounded Spatial Reasoning , 2017, TACL.

[40]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[41]  John Miller,et al.  Traversing Knowledge Graphs in Vector Space , 2015, EMNLP.