Evolution Strategies as a Scalable Alternative to Reinforcement Learning

We explore the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using a novel communication strategy based on common random numbers, our ES implementation only needs to communicate scalars, making it possible to scale to over a thousand parallel workers. This allows us to solve 3D humanoid walking in 10 minutes and obtain competitive results on most Atari games after one hour of training. In addition, we highlight several advantages of ES as a black box optimization technique: it is invariant to action frequency and delayed rewards, tolerant of extremely long horizons, and does not need temporal discounting or value function approximation.

[1]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[2]  H. P. Schwefel,et al.  Numerische Optimierung von Computermodellen mittels der Evo-lutionsstrategie , 1977 .

[3]  J. Geweke,et al.  Antithetic acceleration of Monte Carlo integration in Bayesian inference , 1988 .

[4]  J. Spall Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[5]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[6]  Jieyu Zhao,et al.  Direct Policy Search and Uncertain Policy Evaluation , 1998 .

[7]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[8]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[9]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[10]  Jürgen Schmidhuber,et al.  Training Recurrent Networks by Evolino , 2007, Neural Computation.

[11]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[12]  Tom Schaul,et al.  Efficient natural evolution strategies , 2009, GECCO.

[13]  Kenneth O. Stanley,et al.  A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[14]  Tom Schaul,et al.  Stochastic search using the natural gradient , 2009, ICML '09.

[15]  Anne Auger,et al.  Mirrored Sampling and Sequential Selection for Evolution Strategies , 2010, PPSN.

[16]  Jürgen Schmidhuber,et al.  Evolving neural networks in compressed weight space , 2010, GECCO '10.

[17]  Tom Schaul,et al.  Exponential natural evolution strategies , 2010, GECCO '10.

[18]  Tom Schaul,et al.  A Natural Evolution Strategy for Multi-objective Optimization , 2010, PPSN.

[19]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[20]  Tom Schaul,et al.  High dimensions and heavy tails for natural evolution strategies , 2011, GECCO '11.

[21]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[22]  Jürgen Schmidhuber,et al.  Generalized compressed network search , 2012, GECCO '12.

[23]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Olivier Sigaud,et al.  Policy Improvement Methods: Between Black-Box Optimization and Episodic Reinforcement Learning , 2012 .

[25]  Jürgen Schmidhuber,et al.  Evolving large-scale neural networks for vision-based reinforcement learning , 2013, GECCO '13.

[26]  Risto Miikkulainen,et al.  A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[27]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[28]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[29]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[30]  Martin J. Wainwright,et al.  Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[31]  Elliot Meyerson,et al.  Frame Skip Is a Powerful Parameter for Learning to Play Atari , 2015, AAAI Workshop: Learning for General Competency in Video Games.

[32]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[33]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[34]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[35]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[36]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[37]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[38]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[39]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[40]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[41]  Nicolas Usunier,et al.  Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks , 2016, ArXiv.

[42]  Jürgen Schmidhuber,et al.  A Wavelet-based Encoding for Neuroevolution , 2016, GECCO.

[43]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[44]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[45]  Julian Togelius,et al.  Neuroevolution in Games: State of the Art and Open Challenges , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[46]  Kocsis Zoltán Tamás,et al.  IEEE World Congress on Computational Intelligence , 2019, IEEE Computational Intelligence Magazine.