Guiding Policies with Language via Meta-Learning

Behavioral skills or policies for autonomous agents are conventionally learned from reward functions, via reinforcement learning, or from demonstrations, via imitation learning. However, both modes of task specification have their disadvantages: reward functions require manual engineering, while demonstrations require a human expert to be able to actually perform the task in order to generate the demonstration. Instruction following from natural language instructions provides an appealing alternative: in the same way that we can specify goals to other humans simply by speaking or writing, we would like to be able to specify tasks for our machines. However, a single instruction may be insufficient to fully communicate our intent or, even if it is, may be insufficient for an autonomous agent to actually understand how to perform the desired task. In this work, we propose an interactive formulation of the task specification problem, where iterative language corrections are provided to an autonomous agent, guiding it in acquiring the desired skill. Our proposed language-guided policy learning algorithm can integrate an instruction and a sequence of corrections to acquire new skills very quickly. In our experiments, we show that this method can enable a policy to follow instructions and corrections for simulated navigation and manipulation tasks, substantially outperforming direct, non-interactive instruction following.

[1]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[2]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[3]  Benjamin Kuipers,et al.  Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[4]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[5]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[6]  Luke S. Zettlemoyer,et al.  Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[7]  Daniel Jurafsky,et al.  Learning to Follow Navigational Directions , 2010, ACL.

[8]  Farbod Fahimi,et al.  Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.

[9]  Raymond J. Mooney,et al.  Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[10]  Michèle Sebag,et al.  Preference-Based Policy Learning , 2011, ECML/PKDD.

[11]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[12]  Matthew R. Walter,et al.  Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[13]  Michèle Sebag,et al.  APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.

[14]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Raymond J. Mooney,et al.  Adapting Discriminative Reranking to Grounded Language Learning , 2013, ACL.

[16]  Luke S. Zettlemoyer,et al.  Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[17]  Dan Klein,et al.  Alignment-Based Compositional Semantics for Instruction Following , 2015, EMNLP.

[18]  Christopher D. Manning,et al.  Learning Language Games through Interaction , 2016, ACL.

[19]  Daan Wierstra,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[20]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[21]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[22]  Jing He,et al.  A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems , 2016, INTERSPEECH.

[23]  Pieter Abbeel,et al.  Meta-Learning with Temporal Convolutions , 2017, ArXiv.

[24]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[25]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[26]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[27]  Li Zhang,et al.  Learning to Learn: Meta-Critic Networks for Sample Efficient Learning , 2017, ArXiv.

[28]  John Langford,et al.  Mapping Instructions and Visual Observations to Actions with Reinforcement Learning , 2017, EMNLP.

[29]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[30]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[31]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[32]  Sergey Levine,et al.  Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition , 2018, NeurIPS.

[33]  Sergey Levine,et al.  Learning to Adapt: Meta-Learning for Model-Based Control , 2018, ArXiv.

[34]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[35]  Sergey Levine,et al.  Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Regina Barzilay,et al.  Representation Learning for Grounded Spatial Reasoning , 2017, TACL.

[37]  Peter Stone,et al.  Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces , 2017, AAAI.

[38]  Anca D. Dragan,et al.  Shared Autonomy via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[39]  Dan Klein,et al.  Unified Pragmatic Models for Generating and Following Instructions , 2017, NAACL.

[40]  Dan Klein,et al.  Learning with Latent Language , 2017, NAACL.

[41]  Sergey Levine,et al.  Few-Shot Goal Inference for Visuomotor Learning and Planning , 2018, CoRL.

[42]  Anca D. Dragan,et al.  Learning a Prior over Intent via Meta-Inverse Reinforcement Learning , 2018, ICML.