A Discrete Version of CMA-ES

Modern machine learning uses more and more advanced optimization techniques to find optimal hyper parameters. Whenever the objective function is non-convex, non continuous and with potentially multiple local minima, standard gradient descent optimization methods fail. A last resource and very different method is to assume that the optimum(s), not necessarily unique, is/are distributed according to a distribution and iteratively to adapt the distribution according to tested points. These strategies originated in the early 1960s, named Evolution Strategy (ES) have culminated with the CMA-ES (Covariance Matrix Adaptation) ES. It relies on a multi variate normal distribution and is supposed to be state of the art for general optimization program. However, it is far from being optimal for discrete variables. In this paper, we extend the method to multivariate binomial correlated distributions. For such a distribution, we show that it shares similar features to the multi variate normal: independence and correlation is equivalent and correlation is efficiently modeled by interaction between different variables. We discuss this distribution in the framework of the exponential family. We prove that the model can estimate not only pairwise interactions among the two variables but also is capable of modeling higher order interactions. This allows creating a version of CMA ES that can accomodate efficiently discrete variables. We provide the corresponding algorithm and conclude.

[1]  Anne Auger,et al.  Evolution strategies and CMA-ES (covariance matrix adaptation) , 2014, Annual Conference on Genetic and Evolutionary Computation.

[2]  Anne Auger,et al.  Continuous Optimization and CMA-ES , 2015, GECCO.

[3]  Y. Tong,et al.  Convex Functions, Partial Orderings, and Statistical Applications , 1992 .

[4]  Peter Harremoës,et al.  Binomial and Poisson distributions as maximum entropy distributions , 2001, IEEE Trans. Inf. Theory.

[5]  Eric Benhamou,et al.  Three remarkable properties of the Normal distribution , 2018 .

[6]  Anne Auger,et al.  Benchmarking the (1+1)-CMA-ES on the BBOB-2009 function testbed , 2009, GECCO '09.

[7]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[8]  Anne Auger,et al.  Benchmarking the (1+1)-CMA-ES on the BBOB-2009 noisy testbed , 2009, GECCO '09.

[9]  Thomas M. Cover,et al.  Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing) , 2006 .

[10]  Anne Auger,et al.  Tutorial CMA-ES: evolution strategies and covariance matrix adaptation , 2012, Annual Conference on Genetic and Evolutionary Computation.

[11]  Anne Auger,et al.  A Comparative Study of Large-Scale Variants of CMA-ES , 2018, PPSN.

[12]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  Anne Auger,et al.  LS-CMA-ES: A Second-Order Algorithm for Covariance Matrix Adaptation , 2004, PPSN.

[15]  Anne Auger,et al.  Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[16]  Anne Auger,et al.  Convergence results for the (1, lambda)-SA-ES using the theory of phi-irreducible Markov chains , 2005, Theor. Comput. Sci..

[17]  Stefan Roth,et al.  Covariance Matrix Adaptation for Multi-objective Optimization , 2007, Evolutionary Computation.

[18]  Anne Auger,et al.  CMA-ES: evolution strategies and covariance matrix adaptation , 2011, GECCO.

[19]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.