CoNES: Convex Natural Evolutionary Strategies

We present a novel algorithm -- convex natural evolutionary strategies (CoNES) -- for optimizing high-dimensional blackbox functions by leveraging tools from convex optimization and information geometry. CoNES is formulated as an efficiently-solvable convex program that adapts the evolutionary strategies (ES) gradient estimate to promote rapid convergence. The resulting algorithm is invariant to the parameterization of the belief distribution. Our numerical results demonstrate that CoNES vastly outperforms conventional blackbox optimization methods on a suite of functions used for benchmarking blackbox optimizers. Furthermore, CoNES demonstrates the ability to converge faster than conventional blackbox methods on a selection of OpenAI's MuJoCo reinforcement learning tasks for locomotion.

[1]  Krzysztof Choromanski,et al.  From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization , 2019, NeurIPS.

[2]  Razvan Pascanu,et al.  Revisiting Natural Gradient for Deep Networks , 2013, ICLR.

[3]  Jean-Baptiste Mouret,et al.  Black-box data-efficient policy search for robotics , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[5]  Tomaso A. Poggio,et al.  Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.

[6]  Mashbat Suzuki,et al.  Information Geometry and Statistical Manifold , 2014 .

[7]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[8]  Anne Auger,et al.  Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[9]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[11]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[12]  Nenghai Yu,et al.  Trust Region Evolution Strategies , 2019, AAAI.

[13]  Anne Auger,et al.  COCO: a platform for comparing continuous optimizers in a black-box setting , 2016, Optim. Methods Softw..

[14]  Anuran Makur,et al.  A Study of Local Approximations in Information Theory , 2015 .

[15]  Kenneth O. Stanley,et al.  Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.

[16]  Nikolaos V. Sahinidis,et al.  Simulation optimization: a review of algorithms and applications , 2014, 4OR.

[17]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[18]  Charles Audet,et al.  Blackbox and derivative-free optimization: theory, algorithms and applications , 2016 .

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Stefano Nolfi,et al.  Efficacy of Modern Neuro-Evolutionary Strategies for Continuous Control Optimization , 2019, Frontiers in Robotics and AI.

[21]  Anne Auger,et al.  Real-Parameter Black-Box Optimization Benchmarking 2009: Noiseless Functions Definitions , 2009 .

[22]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[23]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[24]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[25]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[26]  Tom Schaul,et al.  Efficient natural evolution strategies , 2009, GECCO.

[27]  Nassar H. Abdel-All,et al.  Information geometry and statistical manifold , 2003 .

[28]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[29]  Stephen P. Boyd,et al.  A tutorial on geometric programming , 2007, Optimization and Engineering.

[30]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[31]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[32]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[33]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[34]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[35]  Toshiyuki Kondo,et al.  Mirror Descent Search and Acceleration , 2017, Robotics Auton. Syst..

[36]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[37]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[38]  F. Opitz Information geometry and its applications , 2012, 2012 9th European Radar Conference.

[39]  I. Holopainen Riemannian Geometry , 1927, Nature.

[40]  David Ha,et al.  Reinforcement Learning for Improving Agent Design , 2018, Artificial Life.

[41]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[42]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[43]  Luís Paulo Reis,et al.  Deriving and improving CMA-ES with information geometric trust regions , 2017, GECCO.

[44]  Atil Iscen,et al.  Provably Robust Blackbox Optimization for Reinforcement Learning , 2019, CoRL.

[45]  Youhei Akimoto,et al.  Projection-Based Restricted Covariance Matrix Adaptation for High Dimension , 2016, GECCO.