Theoretical Foundation for CMA-ES from Information Geometry Perspective

This paper explores the theoretical basis of the covariance matrix adaptation evolution strategy (CMA-ES) from the information geometry viewpoint.To establish a theoretical foundation for the CMA-ES, we focus on a geometric structure of a Riemannian manifold of probability distributions equipped with the Fisher metric. We define a function on the manifold which is the expectation of fitness over the sampling distribution, and regard the goal of update of the parameters of sampling distribution in the CMA-ES as maximization of the expected fitness. We investigate the steepest ascent learning for the expected fitness maximization, where the steepest ascent direction is given by the natural gradient, which is the product of the inverse of the Fisher information matrix and the conventional gradient of the function.Our first result is that we can obtain under some types of parameterization of multivariate normal distribution the natural gradient of the expected fitness without the need for inversion of the Fisher information matrix. We find that the update of the distribution parameters in the CMA-ES is the same as natural gradient learning for expected fitness maximization. Our second result is that we derive the range of learning rates such that a step in the direction of the exact natural gradient improves the parameters in the expected fitness. We see from the close relation between the CMA-ES and natural gradient learning that the default setting of learning rates in the CMA-ES seems suitable in terms of monotone improvement in expected fitness. Then, we discuss the relation to the expectation-maximization framework and provide an information geometric interpretation of the CMA-ES.

[1]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2]  P. Billingsley,et al.  Probability and Measure , 1980 .

[3]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[4]  Shun-ichi Amari,et al.  Information geometry of the EM and em algorithms for neural networks , 1995, Neural Networks.

[5]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[6]  D. Harville Matrix Algebra From a Statistician's Perspective , 1998 .

[7]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[8]  Shun-ichi Amari,et al.  Why natural gradient? , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[10]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[11]  Tim B. Swartz,et al.  Approximating Integrals Via Monte Carlo and Deterministic Methods , 2000 .

[12]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[13]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[14]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[15]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[16]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[17]  Dirk V. Arnold,et al.  Improving Evolution Strategies through Active Covariance Matrix Adaptation , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[18]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[19]  Simon M. Lucas,et al.  Parallel Problem Solving from Nature - PPSN X, 10th International Conference Dortmund, Germany, September 13-17, 2008, Proceedings , 2008, PPSN.

[20]  Tom Schaul,et al.  Fitness Expectation Maximization , 2008, PPSN.

[21]  Raymond Ros,et al.  A Simple Modification in CMA-ES Achieving Linear Time and Space Complexity , 2008, PPSN.

[22]  Isao Ono,et al.  Functionally specialized CMA-ES: a modification of CMA-ES based on the specialization of the functions of covariance matrix adaptation and step size adaptation , 2008, GECCO '08.

[23]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[24]  Tom Schaul,et al.  Efficient natural evolution strategies , 2009, GECCO.

[25]  Tom Schaul,et al.  Stochastic search using the natural gradient , 2009, ICML '09.

[26]  Isao Ono,et al.  Bidirectional Relation between CMA Evolution Strategies and Natural Evolution Strategies , 2010, PPSN.

[27]  Isao Ono,et al.  Theoretical analysis of evolutionary computation on continuously differentiable functions , 2010, GECCO '10.

[28]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[29]  Tom Schaul,et al.  Exponential natural evolution strategies , 2010, GECCO '10.

[30]  Robert Schaefer Parallel Problem Solving from Nature - PPSN XI, 11th International Conference, Kraków, Poland, September 11-15, 2010. Proceedings, Part II , 2010, PPSN.

[31]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.