论文信息 - Importance mixing: Improving sample reuse in evolutionary policy search methods

Importance mixing: Improving sample reuse in evolutionary policy search methods

Deep neuroevolution, that is evolutionary policy search methods based on deep neural networks, have recently emerged as a competitor to deep reinforcement learning algorithms due to their better parallelization capabilities. However, these methods still suffer from a far worse sample efficiency. In this paper we investigate whether a mechanism known as "importance mixing" can significantly improve their sample efficiency. We provide a didactic presentation of importance mixing and we explain how it can be extended to reuse more samples. Then, from an empirical comparison based on a simple benchmark, we show that, though it actually provides better sample efficiency, it is still far from the sample efficiency of deep reinforcement learning, though it is more stable.

Olivier Sigaud | Aloïs Pourchot | Nicolas Perrin

[1] Jan Peters,et al. A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[2] Thomas Bäck,et al. Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[3] Olivier Sigaud,et al. Robot Skill Learning: From Reinforcement Learning to Evolution Strategies , 2013, Paladyn J. Behav. Robotics.

[4] Kenneth O. Stanley,et al. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.

[5] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7] Kagan Tumer,et al. Evolutionary Reinforcement Learning , 2018, NIPS 2018.

[8] Pierre-Yves Oudeyer,et al. GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms , 2017, ICML.

[9] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[10] Olivier Sigaud,et al. Policy Search in Continuous Action Domains: an Overview , 2018, Neural Networks.

[11] Kenneth O. Stanley,et al. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning , 2017, ArXiv.

[12] Anne Auger,et al. Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[13] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[14] J. A. Lozano,et al. Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[15] Olivier Sigaud,et al. Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[16] Tom Schaul,et al. Exponential natural evolution strategies , 2010, GECCO '10.

[17] Tom Schaul,et al. Efficient natural evolution strategies , 2009, GECCO.

[18] Tom Schaul,et al. High dimensions and heavy tails for natural evolution strategies , 2011, GECCO '11.

[19] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.

[20] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[21] Kagan Tumer,et al. Evolution-Guided Policy Gradient in Reinforcement Learning , 2018, NeurIPS.

[22] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[23] Isao Ono,et al. Bidirectional Relation between CMA Evolution Strategies and Natural Evolution Strategies , 2010, PPSN.

[24] Olivier Sigaud,et al. Policy Improvement Methods: Between Black-Box Optimization and Episodic Reinforcement Learning , 2012 .

[25] R. Mazo. On the theory of brownian motion , 1973 .