Guided evolutionary strategies: augmenting random search with surrogate gradients

Many applications in machine learning require optimizing a function whose true gradient is unknown, but where surrogate gradient information (directions that may be correlated with, but not necessarily identical to, the true gradient) is available instead. This arises when an approximate gradient is easier to compute than the full gradient (e.g. in meta-learning or unrolled optimization), or when a true gradient is intractable and is replaced with a surrogate (e.g. in certain reinforcement learning applications, or when using synthetic gradients). We propose Guided Evolutionary Strategies, a method for optimally using surrogate gradient directions along with random search. We define a search distribution for evolutionary strategies that is elongated along a guiding subspace spanned by the surrogate gradients. This allows us to estimate a descent direction which can then be passed to a first-order optimizer. We analytically and numerically characterize the tradeoffs that result from tuning how strongly the search distribution is stretched along the guiding subspace, and we use this to derive a setting of the hyperparameters that works well across problems. Finally, we apply our method to example problems, demonstrating an improvement over both standard evolutionary strategies and first-order methods (that directly follow the surrogate gradient). We provide a demo of Guided ES at this https URL

[1]  Martin J. Wainwright,et al.  Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[4]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[5]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[6]  Jitendra Malik,et al.  Learning to Optimize , 2016, ICLR.

[7]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[8]  Wei Zhang,et al.  Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks , 2018, NeurIPS.

[9]  Tom Schaul,et al.  Exponential natural evolution strategies , 2010, GECCO '10.

[10]  Olivier Sigaud,et al.  CEM-RL: Combining evolutionary and gradient-based methods for policy search , 2018, ICLR.

[11]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[12]  Mark W. Schmidt,et al.  Online Learning Rate Adaptation with Hypergradient Descent , 2017, ICLR.

[13]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[14]  Justin Domke,et al.  Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[15]  Daniel Cownden,et al.  Random feedback weights support learning in deep neural networks , 2014, ArXiv.

[16]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[17]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[18]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[19]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[20]  Misha Denil,et al.  Learned Optimizers that Scale and Generalize , 2017, ICML.

[21]  Pieter Abbeel,et al.  Evolved Policy Gradients , 2018, NeurIPS.

[22]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[23]  Kenneth O. Stanley,et al.  Safe mutations for deep and recurrent neural networks through output gradients , 2017, GECCO.

[24]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[25]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[26]  Jascha Sohl-Dickstein,et al.  REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[27]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[28]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[29]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[30]  Alex Graves,et al.  Decoupled Neural Interfaces using Synthetic Gradients , 2016, ICML.

[31]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[32]  Arild Nøkland,et al.  Direct Feedback Alignment Provides Learning in Deep Neural Networks , 2016, NIPS.

[33]  Kenneth O. Stanley,et al.  ES is more than just a traditional finite-difference approximator , 2017, GECCO.

[34]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[35]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[36]  Renjie Liao,et al.  Understanding Short-Horizon Bias in Stochastic Meta-Optimization , 2018, ICLR.

[37]  Nikolaus Hansen,et al.  Injecting External Solutions Into CMA-ES , 2011, ArXiv.

[38]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[39]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[40]  David Ha,et al.  Neuroevolution for deep reinforcement learning problems , 2019, GECCO.

[41]  Richard E. Turner,et al.  Structured Evolution with Compact Architectures for Scalable Policy Optimization , 2018, ICML.

[42]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[43]  Jian Li,et al.  Learning Gradient Descent: Better Generalization and Longer Horizons , 2017, ICML.