Deriving and improving CMA-ES with information geometric trust regions

CMA-ES is one of the most popular stochastic search algorithms. It performs favourably in many tasks without the need of extensive parameter tuning. The algorithm has many beneficial properties, including automatic step-size adaptation, efficient covariance updates that incorporates the current samples as well as the evolution path and its invariance properties. Its update rules are composed of well established heuristics where the theoretical foundations of some of these rules are also well understood. In this paper we will fully derive all CMA-ES update rules within the framework of expectation-maximisation-based stochastic search algorithms using information-geometric trust regions. We show that the use of the trust region results in similar updates to CMA-ES for the mean and the covariance matrix while it allows for the derivation of an improved update rule for the step-size. Our new algorithm, Trust-Region Co-variance Matrix Adaptation Evolution Strategy (TR-CMA-ES) is fully derived from first order optimization principles and performs favourably in compare to standard CMA-ES algorithm.

[1]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[2]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Tutorial , 2016, ArXiv.

[3]  Anne Auger,et al.  Comparison-based natural gradient optimization in high dimension , 2014, GECCO.

[4]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[5]  Isao Ono,et al.  Functionally specialized CMA-ES: a modification of CMA-ES based on the specialization of the functions of covariance matrix adaptation and step size adaptation , 2008, GECCO '08.

[6]  Isao Ono,et al.  Theoretical Foundation for CMA-ES from Information Geometry Perspective , 2012, Algorithmica.

[7]  Luís Paulo Reis,et al.  Model-Based Relative Entropy Stochastic Search , 2016, NIPS.

[8]  Luís Paulo Reis,et al.  Regularized covariance estimation for weighted maximum likelihood policy search methods , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[9]  Betty J. Mohler,et al.  Learning perceptual coupling for motor primitives , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[11]  Michèle Sebag,et al.  KL-based Control of the Learning Schedule for Surrogate Black-Box Optimization , 2013, ArXiv.

[12]  Anne Auger,et al.  Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[13]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[14]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[15]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[16]  Gerhard Neumann,et al.  Variational Inference for Policy Search in changing situations , 2011, ICML.

[17]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[18]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .