论文信息 - Guiding Policies with Language via Meta-Learning

Guiding Policies with Language via Meta-Learning

Behavioral skills or policies for autonomous agents are conventionally learned from reward functions, via reinforcement learning, or from demonstrations, via imitation learning. However, both modes of task specification have their disadvantages: reward functions require manual engineering, while demonstrations require a human expert to be able to actually perform the task in order to generate the demonstration. Instruction following from natural language instructions provides an appealing alternative: in the same way that we can specify goals to other humans simply by speaking or writing, we would like to be able to specify tasks for our machines. However, a single instruction may be insufficient to fully communicate our intent or, even if it is, may be insufficient for an autonomous agent to actually understand how to perform the desired task. In this work, we propose an interactive formulation of the task specification problem, where iterative language corrections are provided to an autonomous agent, guiding it in acquiring the desired skill. Our proposed language-guided policy learning algorithm can integrate an instruction and a sequence of corrections to acquire new skills very quickly. In our experiments, we show that this method can enable a policy to follow instructions and corrections for simulated navigation and manipulation tasks, substantially outperforming direct, non-interactive instruction following.

[1] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[2] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[3] Benjamin Kuipers,et al. Walk the Talk: Connecting Language, Knowledge, and Action in Route Instructions , 2006, AAAI.

[4] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[5] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[6] Luke S. Zettlemoyer,et al. Reinforcement Learning for Mapping Instructions to Actions , 2009, ACL.

[7] Daniel Jurafsky,et al. Learning to Follow Navigational Directions , 2010, ACL.

[8] Farbod Fahimi,et al. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.

[9] Raymond J. Mooney,et al. Learning to Interpret Natural Language Navigation Instructions from Observations , 2011, Proceedings of the AAAI Conference on Artificial Intelligence.

[10] Michèle Sebag,et al. Preference-Based Policy Learning , 2011, ECML/PKDD.

[11] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[12] Matthew R. Walter,et al. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.

[13] Michèle Sebag,et al. APRIL: Active Preference-learning based Reinforcement Learning , 2012, ECML/PKDD.

[14] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15] Raymond J. Mooney,et al. Adapting Discriminative Reranking to Grounded Language Learning , 2013, ACL.

[16] Luke S. Zettlemoyer,et al. Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[17] Dan Klein,et al. Alignment-Based Compositional Semantics for Instruction Following , 2015, EMNLP.

[18] Christopher D. Manning,et al. Learning Language Games through Interaction , 2016, ACL.

[19] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[20] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[21] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[22] Jing He,et al. A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems , 2016, INTERSPEECH.

[23] Pieter Abbeel,et al. Meta-Learning with Temporal Convolutions , 2017, ArXiv.

[24] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[25] Honglak Lee,et al. Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[26] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.