论文信息 - Meta Reinforcement Learning with Distribution of Exploration Parameters Learned by Evolution Strategies

Meta Reinforcement Learning with Distribution of Exploration Parameters Learned by Evolution Strategies

In this paper, we propose a novel meta-learning method in a reinforcement learning setting, based on evolution strategies (ES), exploration in parameter space and deterministic policy gradients. ES methods are easy to parallelize, which is desirable for modern training architectures; however, such methods typically require a huge number of samples for effective training. We use deterministic policy gradients during adaptation and other techniques to compensate for the sample-efficiency problem while maintaining the inherent scalability of ES methods. We demonstrate that our method achieves good results compared to gradient-based meta-learning in high-dimensional control tasks in the MuJoCo simulator. In addition, because of gradient-free methods in the meta-training phase, which do not need information about gradients and policies in adaptation training, we predict and confirm our algorithm performs better in tasks that need multi-step adaptation.

Kehan Yang | Yufeng Yuan | Yiming Shen | Simon Cheng Liu

[1] Thomas Bäck,et al. A Survey of Evolution Strategies , 1991, ICGA.

[2] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[3] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[4] Tomás Svoboda,et al. Safe Exploration Techniques for Reinforcement Learning - An Overview , 2014, MESAS.

[5] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[6] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[7] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[8] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[9] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[10] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[11] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12] Marcin Andrychowicz,et al. Parameter Space Noise for Exploration , 2017, ICLR.

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[15] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[16] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.