Meta Reinforcement Learning with Task Embedding and Shared Policy

Despite significant progress, deep reinforcement learning (RL) suffers from data-inefficiency and limited generalization. Recent efforts apply meta-learning to learn a meta-learner from a set of RL tasks such that a novel but related task could be solved quickly. Though specific in some ways, different tasks in meta-RL are generally similar at a high level. However, most meta-RL methods do not explicitly and adequately model the specific and shared information among different tasks, which limits their ability to learn training tasks and to generalize to novel tasks. In this paper, we propose to capture the shared information on the one hand and meta-learn how to quickly abstract the specific information about a task on the other hand. Methodologically, we train an SGD meta-learner to quickly optimize a task encoder for each task, which generates a task embedding based on past experience. Meanwhile, we learn a policy which is shared across all tasks and conditioned on task embeddings. Empirical results on four simulated tasks demonstrate that our method has better learning capacity on both training and novel tasks and attains up to 3 to 4 times higher returns compared to baselines.

[1]  Pieter Abbeel,et al.  Meta-Learning with Temporal Convolutions , 2017, ArXiv.

[2]  Jiashi Feng,et al.  Transferable Meta Learning Across Domains , 2018, UAI.

[3]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[4]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[5]  Marcello Restelli,et al.  Transfer of Value Functions via Variational Methods , 2018, NeurIPS.

[6]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[7]  Sergey Levine,et al.  Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[8]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[9]  Qiang Liu,et al.  Learning to Explore with Meta-Policy Gradient , 2018, ICML 2018.

[10]  Pieter Abbeel,et al.  Some Considerations on Learning to Explore via Meta-Reinforcement Learning , 2018, ICLR 2018.

[11]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[12]  Anca D. Dragan,et al.  Learning a Prior over Intent via Meta-Inverse Reinforcement Learning , 2018, ICML.

[13]  Zhe Zhao,et al.  Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts , 2018, KDD.

[14]  David Silver,et al.  Meta-Gradient Reinforcement Learning , 2018, NeurIPS.

[15]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[16]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[17]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[18]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  Sergey Levine,et al.  Unsupervised Meta-Learning for Reinforcement Learning , 2018, ArXiv.

[21]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[22]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Pieter Abbeel,et al.  Evolved Policy Gradients , 2018, NeurIPS.

[24]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[25]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[26]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[27]  Sergey Levine,et al.  Learning to Adapt: Meta-Learning for Model-Based Control , 2018, ArXiv.

[28]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[29]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[30]  Pieter Abbeel,et al.  Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments , 2017, ICLR.

[31]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.