论文信息 - Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

Deep artificial neural networks (DNNs) are typically trained via gradient-based learning algorithms, namely backpropagation. Evolution strategies (ES) can rival backprop-based algorithms such as Q-learning and policy gradients on challenging deep reinforcement learning (RL) problems. However, ES can be considered a gradient-based algorithm because it performs stochastic gradient descent via an operation similar to a finite-difference approximation of the gradient. That raises the question of whether non-gradient-based evolutionary algorithms can work at DNN scales. Here we demonstrate they can: we evolve the weights of a DNN with a simple, gradient-free, population-based genetic algorithm (GA) and it performs well on hard deep RL problems, including Atari and humanoid locomotion. The Deep GA successfully evolves networks with over four million free parameters, the largest neural networks ever evolved with a traditional evolutionary algorithm. These results (1) expand our sense of the scale at which GAs can operate, (2) suggest intriguingly that in some cases following the gradient is not the best choice for optimizing performance, and (3) make immediately available the multitude of techniques that have been developed in the neuroevolution community to improve performance on RL problems. To demonstrate the latter, we show that combining DNNs with novelty search, which was designed to encourage exploration on tasks with deceptive or sparse reward functions, can solve a high-dimensional problem on which reward-maximizing algorithms (e.g. DQN, A3C, ES, and the GA) fail. Additionally, the Deep GA parallelizes better than ES, A3C, and DQN, and enables a state-of-the-art compact encoding technique that can represent million-parameter DNNs in thousands of bytes.

[1] L. C. Stayton,et al. On the effectiveness of crossover in simulated evolutionary optimization. , 1994, Bio Systems.

[2] Dan Boneh,et al. On genetic algorithms , 1995, COLT '95.

[3] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[4] Randy L. Haupt,et al. Practical Genetic Algorithms , 1998 .

[5] Arthur E. Hoerl,et al. Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[6] A. E. Hoerl,et al. Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[7] Goldberg,et al. Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[8] Luigi Fortuna,et al. Chaotic sequences to improve the performance of evolutionary algorithms , 2003, IEEE Trans. Evol. Comput..

[9] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[10] Chang Wook Ahn,et al. On the practical genetic algorithms , 2005, GECCO '05.

[11] H. Robbins. A Stochastic Approximation Method , 1951 .

[12] Kenneth O. Stanley,et al. Compositional Pattern Producing Networks : A Novel Abstraction of Development , 2007 .

[13] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[14] Stéphane Doncieux,et al. Overcoming the bootstrap problem in evolutionary robotics using behavioral diversity , 2009, 2009 IEEE Congress on Evolutionary Computation.

[15] Kenneth O. Stanley. A Hypercube-Based Indirect Encoding for Evolving Large-Scale Neural Networks , 2009 .

[16] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[17] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.

[18] Kenneth O. Stanley,et al. Abandoning Objectives: Evolution Through the Search for Novelty Alone , 2011, Evolutionary Computation.

[19] Kenneth O. Stanley,et al. Evolving a diversity of virtual creatures through novelty search and local competition , 2011, GECCO '11.

[20] Kenneth O. Stanley,et al. On the Performance of Indirect Encoding Across the Continuum of Regularity , 2011, IEEE Transactions on Evolutionary Computation.

[21] Dong Yu,et al. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[22] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24] Charles Ofria,et al. Natural Selection Fails to Optimize Mutation Rates for Long-Term Adaptation on Rugged Fitness Landscapes , 2008, ECAL.

[25] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[26] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[27] Surya Ganguli,et al. On the saddle point problem for non-convex optimization , 2014, ArXiv.

[28] Antoine Cully,et al. Robots that can adapt like animals , 2014, Nature.

[29] Jean-Baptiste Mouret,et al. Illuminating search spaces by mapping elites , 2015, ArXiv.

[30] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[31] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[32] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33] Shane Legg,et al. Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[34] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[35] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.