Comparison-based natural gradient optimization in high dimension

We propose a novel natural gradient based stochastic search algorithm, VD-CMA, for the optimization of high dimensional numerical functions. The algorithm is comparison-based and hence invariant to monotonic transformations of the objective function. It adapts a multivariate normal distribution with a restricted covariance matrix with twice the dimension as degrees of freedom, representing an arbitrarily oriented long axis and additional axis-parallel scaling. We derive the different components of the algorithm and show linear internal time and space complexity. We find empirically that the algorithm adapts its covariance matrix to the inverse Hessian on convex-quadratic functions with an Hessian with one short axis and different scaling on the diagonal. We then evaluate VD-CMA on test functions and compare it to different methods. On functions covered by the internal model of VD-CMA and on the Rosenbrock function, VD-CMA outperforms CMA-ES (having quadratic internal time and space complexity) not only in internal complexity but also in number of function calls with increasing dimension.

[1]  Andreas Zell,et al.  Main vector adaptation: a CMA variant with linear time and space complexity , 2001 .

[2]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[3]  Christian Igel,et al.  Efficient covariance matrix update for variable metric evolution strategies , 2009, Machine Learning.

[4]  Isao Ono,et al.  Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks , 2010, NIPS.

[5]  Nikolaus Hansen,et al.  On the Adaptation of Arbitrary Normal Mutation Distributions in Evolution Strategies: The Generating Set Adaptation , 1995, ICGA.

[6]  Isao Ono,et al.  Theoretical Foundation for CMA-ES from Information Geometry Perspective , 2012, Algorithmica.

[7]  Tom Schaul,et al.  A linear time natural evolution strategy for non-separable functions , 2011, GECCO.

[8]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[9]  Anne Auger,et al.  Principled Design of Continuous Stochastic Search: From Theory to Practice , 2014, Theory and Principled Methods for the Design of Metaheuristics.

[10]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[11]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[12]  Matteo Matteucci,et al.  Towards the geometry of estimation of distribution algorithms based on the exponential family , 2011, FOGA '11.

[13]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[14]  Tom Schaul,et al.  Exponential natural evolution strategies , 2010, GECCO '10.

[15]  Juha Karhunen,et al.  Natural Conjugate Gradient in Variational Inference , 2007, ICONIP.

[16]  Raymond Ros,et al.  A Simple Modification in CMA-ES Achieving Linear Time and Space Complexity , 2008, PPSN.

[17]  Kenji Fukumizu,et al.  Adaptive Method of Realizing Natural Gradient Learning for Multilayer Perceptrons , 2000, Neural Computation.

[18]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[19]  Isao Ono,et al.  Bidirectional Relation between CMA Evolution Strategies and Natural Evolution Strategies , 2010, PPSN.

[20]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.