Grounding Language to Non-Markovian Tasks with No Supervision of Task Specifications

Natural language instructions often exhibit sequential constraints rather than being simply goal-oriented, for example “go around the lake and then travel north until the intersection”. Existing approaches map these kinds of natural language expressions to Linear Temporal Logic expressions but require an expensive dataset of LTL expressions paired with English sentences. We introduce an approach that can learn to map from English to LTL expressions given only pairs of English sentences and trajectories, enabling a robot to understand commands with sequential constraints. We use formal methods of LTL progression to reward the produced logical forms by progressing each LTL logical form against the groundtruth trajectory, represented as a sequence of states, so that no LTL expressions are needed during training. We evaluate in two ways: on the SAIL dataset, a benchmark artificial environment of 3,266 trajectories and language commands as well as on 10 newly-collected real-world environments of roughly the same size. We show that our model correctly interprets natural language commands with 76.9% accuracy on average. We demonstrate the end-to-end process in real-time in simulation, starting with only a natural language instruction and an initial robot state, producing a logical form from the model trained with trajectories, and finding a trajectory that satisfies sequential constraints with an LTL planner in the environment.

[1]  S. Shankar Sastry,et al.  A learning based approach to control synthesis of Markov decision processes for linear temporal logic specifications , 2014, 53rd IEEE Conference on Decision and Control.

[2]  Tao Yu,et al.  Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task , 2018, EMNLP.

[3]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[4]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[5]  Ashish Vaswani,et al.  Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation , 2019, ACL.

[6]  John C. Tang,et al.  3D Collaboration Method over HoloLens™ and Skype™ End Points , 2015, ImmersiveME@ACM Multimedia.

[7]  Stefanie Tellex,et al.  Planning with State Abstractions for Non-Markovian Task Specifications , 2019, Robotics: Science and Systems.

[8]  Dan Roth,et al.  Learning from natural instructions , 2011, Machine Learning.

[9]  Tom M. Mitchell,et al.  Weakly Supervised Training of Semantic Parsers , 2012, EMNLP.

[10]  Luke S. Zettlemoyer,et al.  Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.

[11]  Raymond J. Mooney,et al.  Training a Multilingual Sportscaster: Using Perceptual Context to Learn Language , 2014, J. Artif. Intell. Res..

[12]  Stefanie Tellex,et al.  Learning to Parse Natural Language to Grounded Reward Functions with Weak Supervision , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[13]  John Langford,et al.  Mapping Instructions and Visual Observations to Actions with Reinforcement Learning , 2017, EMNLP.

[14]  Percy Liang,et al.  From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood , 2017, ACL.

[15]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[16]  Calin Belta,et al.  Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Marco Baroni,et al.  Still not systematic after all these years: On the compositional skills of sequence-to-sequence recurrent networks , 2017, ICLR 2018.

[18]  Stefanie Tellex,et al.  Planning with Abstract Markov Decision Processes , 2017, ICAPS.

[19]  Raymond J. Mooney,et al.  Automated Construction of Database Interfaces: Intergrating Statistical and Relational Learning for Semantic Parsing , 2000, EMNLP.

[20]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[21]  Luke S. Zettlemoyer,et al.  Bootstrapping Semantic Parsers from Conversations , 2011, EMNLP.

[22]  Sheila A. McIlraith,et al.  Teaching Multiple Tasks to an RL Agent using LTL , 2018, AAMAS.

[23]  Dan Klein,et al.  Speaker-Follower Models for Vision-and-Language Navigation , 2018, NeurIPS.

[24]  Ming-Wei Chang,et al.  Driving Semantic Parsing from the World’s Response , 2010, CoNLL.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Stefanie Tellex,et al.  Sequence-to-Sequence Language Grounding of Non-Markovian Task Specifications , 2018, Robotics: Science and Systems.

[27]  Guido Bugmann,et al.  Using verbal instructions for route learning: Instruction Analysis , 2001 .

[28]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[29]  Ufuk Topcu,et al.  Environment-Independent Task Specifications via GLTL , 2017, ArXiv.

[30]  Raymond J. Mooney,et al.  Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision , 2012, EMNLP.

[31]  Ufuk Topcu,et al.  Learning from Demonstrations with High-Level Side Information , 2017, IJCAI.

[32]  David L. Chen Fast Online Lexicon Learning for Grounded Language Acquisition , 2012, ACL.

[33]  S. Sieber On a decision method in restricted second-order arithmetic , 1960 .

[34]  Dan Klein,et al.  Alignment-Based Compositional Semantics for Instruction Following , 2015, EMNLP.

[35]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[36]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[37]  Matthew R. Walter,et al.  Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[38]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[39]  Luke S. Zettlemoyer,et al.  Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[40]  Anne H. Anderson,et al.  The Hcrc Map Task Corpus , 1991 .

[41]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[42]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[43]  Jonathan Berant,et al.  Weakly Supervised Semantic Parsing with Abstract Examples , 2017, ACL.