Learning to Follow Language Instructions with Compositional Policies

We propose a framework that learns to execute natural language instructions in an environment consisting of goalreaching tasks that share components of their task descriptions. Our approach leverages the compositionality of both value functions and language, with the aim of reducing the sample complexity of learning novel tasks. First, we train a reinforcement learning agent to learn value functions that can be subsequently composed through a Boolean algebra to solve novel tasks. Second, we fine-tune a seq2seq model pretrained on web-scale corpora to map language to logical expressions that specify the required value function compositions. Evaluating our agent in the BabyAI domain, we observe a decrease of 86% in the number of training steps needed to learn a second task after mastering a single task. Results from ablation studies further indicate that it is the combination of compositional value functions and language representations that allows the agent to quickly generalize to new tasks.

[1]  Ruslan Salakhutdinov,et al.  Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.

[2]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[3]  Stefanie Tellex,et al.  Sequence-to-Sequence Language Grounding of Non-Markovian Task Specifications , 2018, Robotics: Science and Systems.

[4]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[5]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[6]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[7]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[8]  Andrei Barbu,et al.  Compositional RL Agents That Follow Language Commands in Temporal Logic , 2021, Frontiers in Robotics and AI.

[9]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[10]  Geraud Nangue Tasse,et al.  A Boolean Task Algebra for Reinforcement Learning , 2020, NeurIPS.

[11]  Rémi Louf,et al.  Transformers : State-ofthe-art Natural Language Processing , 2019 .

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  Sergey Levine,et al.  Composable Deep Reinforcement Learning for Robotic Manipulation , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Ross A. Knepper,et al.  Learning to Map Natural Language Instructions to Physical Quadcopter Control using Simulated Flight , 2019, CoRL.

[15]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[16]  Nicolas Heess,et al.  Composing Entropic Policies using Divergence Correction , 2018, ICML.

[17]  Emanuel Todorov,et al.  Compositionality of optimal control laws , 2009, NIPS.

[18]  Stefanie Tellex,et al.  Simultaneously Learning Transferable Symbols and Language Groundings from Perceptual Data for Instruction Following , 2020, Robotics: Science and Systems.

[19]  Michael Johnson,et al.  Compositionality , 2020, The Wiley Blackwell Companion to Semantics.

[20]  Jacob Andreas,et al.  Good-Enough Compositional Data Augmentation , 2019, ACL.

[21]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[22]  Matthias Scheutz,et al.  What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution , 2009, 2009 IEEE International Conference on Robotics and Automation.

[23]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[24]  Qi Wu,et al.  Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Benjamin Rosman,et al.  Composing Value Functions in Reinforcement Learning , 2019, ICML.

[26]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[27]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[28]  Pradyumna Tambwekar,et al.  Interpretable Policy Specification and Synthesis through Natural Language and RL , 2021, ArXiv.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[31]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[32]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[33]  C. Allen,et al.  Stanford Encyclopedia of Philosophy , 2011 .

[34]  E. J. Collins,et al.  An analysis of transient Markov decision processes , 2006, Journal of Applied Probability.

[35]  Stefanie Tellex,et al.  Learning to Parse Natural Language to Grounded Reward Functions with Weak Supervision , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).