LM-CMA: An Alternative to L-BFGS for Large-Scale Black Box Optimization

Limited-memory BFGS (L-BFGS; Liu and Nocedal, 1989) is often considered to be the method of choice for continuous optimization when first- or second-order information is available. However, the use of L-BFGS can be complicated in a black box scenario where gradient information is not available and therefore should be numerically estimated. The accuracy of this estimation, obtained by finite difference methods, is often problem-dependent and may lead to premature convergence of the algorithm. This article demonstrates an alternative to L-BFGS, the limited memory covariance matrix adaptation evolution strategy (LM-CMA) proposed by Loshchilov (2014). LM-CMA is a stochastic derivative-free algorithm for numerical optimization of nonlinear, nonconvex optimization problems. Inspired by L-BFGS, LM-CMA samples candidate solutions according to a covariance matrix reproduced from m direction vectors selected during the optimization process. The decomposition of the covariance matrix into Cholesky factors allows reducing the memory complexity to , where n is the number of decision variables. The time complexity of sampling one candidate solution is also but scales as only about 25 scalar-vector multiplications in practice. The algorithm has an important property of invariance with respect to strictly increasing transformations of the objective function; such transformations do not compromise its ability to approach the optimum. LM-CMA outperforms the original CMA-ES and its large-scale versions on nonseparable ill-conditioned problems with a factor increasing with problem dimension. Invariance properties of the algorithm do not prevent it from demonstrating a comparable performance to L-BFGS on nontrivial large-scale smooth and nonsmooth optimization problems.

[1]  Nikolaus Hansen,et al.  Adaptive Encoding: How to Render Search Coordinate System Invariant , 2008, PPSN.

[2]  Michèle Sebag,et al.  Maximum Likelihood-Based Online Adaptation of Hyper-Parameters in CMA-ES , 2014, PPSN.

[3]  Anne Auger,et al.  Mirrored Sampling and Sequential Selection for Evolution Strategies , 2010, PPSN.

[4]  Anne Auger,et al.  Comparison-based natural gradient optimization in high dimension , 2014, GECCO.

[5]  James N. Knight,et al.  Reducing the space-time complexity of the CMA-ES , 2007, GECCO '07.

[6]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[7]  Raymond Ros,et al.  Benchmarking a weighted negative covariance matrix update on the BBOB-2010 noiseless testbed , 2010, GECCO '10.

[8]  Anne Auger,et al.  Principled Design of Continuous Stochastic Search: From Theory to Practice , 2014, Theory and Principled Methods for the Design of Metaheuristics.

[9]  P. Wolfe Convergence Conditions for Ascent Methods. II , 1969 .

[10]  HerreraFrancisco,et al.  A study on the use of non-parametric tests for analyzing the evolutionary algorithms' behaviour , 2009 .

[11]  Christian Igel,et al.  Efficient covariance matrix update for variable metric evolution strategies , 2009, Machine Learning.

[12]  Dirk V. Arnold,et al.  Improving Evolution Strategies through Active Covariance Matrix Adaptation , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[13]  Anne Auger,et al.  Linear Convergence of Comparison-based Step-size Adaptive Randomized Search via Stability of Markov Chains , 2013, SIAM J. Optim..

[14]  Dirk V. Arnold,et al.  On the Behaviour of the (1, λ)-ES for Conically Constrained Linear Problems , 2014, Evolutionary Computation.

[15]  Michèle Sebag,et al.  Self-adaptive surrogate-assisted covariance matrix adaptation evolution strategy , 2012, GECCO '12.

[16]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[17]  M. Brand,et al.  Fast low-rank modifications of the thin singular value decomposition , 2006 .

[18]  Xin Yao,et al.  Fast Evolution Strategies , 1997, Evolutionary Programming.

[19]  Anne Auger,et al.  Evolution Strategies , 2018, Handbook of Computational Intelligence.

[20]  Michèle Sebag,et al.  Bi-population CMA-ES agorithms with surrogate models and line searches , 2013, GECCO.

[21]  Ilya Loshchilov,et al.  A computationally efficient limited memory CMA-ES for large scale optimization , 2014, GECCO.

[22]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[23]  Ilya Loshchilov,et al.  CMA-ES with restarts for solving CEC 2013 benchmark problems , 2013, 2013 IEEE Congress on Evolutionary Computation.

[24]  Siam Rfview,et al.  CONVERGENCE CONDITIONS FOR ASCENT METHODS , 2016 .

[25]  Alex A. Freitas,et al.  Evolutionary Computation , 2002 .

[26]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[27]  Tobias Glasmachers Convergence of the IGO-Flow of Isotropic Gaussian Distributions on Convex Quadratic Problems , 2012, PPSN.

[28]  Mohamed-Jalal Fadili,et al.  A quasi-Newton proximal splitting method , 2012, NIPS.

[29]  Petros Koumoutsakos,et al.  Local Meta-models for Optimization Using Evolution Strategies , 2006, PPSN.

[30]  Nikolaus Hansen,et al.  Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[31]  Petros Koumoutsakos,et al.  A Method for Handling Uncertainty in Evolutionary Optimization With an Application to Feedback Control of Combustion , 2009, IEEE Transactions on Evolutionary Computation.

[32]  Francisco Herrera,et al.  A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: a case study on the CEC’2005 Special Session on Real Parameter Optimization , 2009, J. Heuristics.

[33]  Ilya Loshchilov,et al.  Surrogate-Assisted Evolutionary Algorithms , 2013 .

[34]  Michèle Sebag,et al.  Adaptive coordinate descent , 2011, GECCO '11.

[35]  K. Steiglitz,et al.  Adaptive step size random search , 1968 .

[36]  Charles Audet,et al.  Convergence of Mesh Adaptive Direct Search to Second-Order Stationary Points , 2006, SIAM J. Optim..

[37]  Anne Auger,et al.  BBOB 2009: Comparison Tables of All Algorithms on All Noiseless Functions , 2010 .

[38]  Raymond Ros,et al.  A Simple Modification in CMA-ES Achieving Linear Time and Space Complexity , 2008, PPSN.

[39]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[40]  Anne Auger,et al.  A median success rule for non-elitist evolution strategies: study of feasibility , 2013, GECCO '13.

[41]  Youhei Akimoto,et al.  Objective improvement in information-geometric optimization , 2012, FOGA XII '13.

[42]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[43]  C.-S. Chien,et al.  Effective condition number for finite difference method , 2007 .

[44]  Raymond Ros,et al.  Real-Parameter Black-Box Optimization Benchmarking 2009: Experimental Setup , 2009 .

[45]  D. Shanno Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[46]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[47]  Stefan Roth,et al.  Covariance Matrix Adaptation for Multi-objective Optimization , 2007, Evolutionary Computation.

[48]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[49]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[50]  Anne Auger,et al.  Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[51]  Anne Auger,et al.  Impacts of invariance in search: When CMA-ES and PSO face ill-conditioned and non-separable problems , 2011, Appl. Soft Comput..

[52]  Hans-Georg Beyer,et al.  Convergence Analysis of Evolutionary Algorithms That Are Based on the Paradigm of Information Geometry , 2014, Evolutionary Computation.

[53]  Anne Auger,et al.  How to Assess Step-Size Adaptation Mechanisms in Randomised Search , 2014, PPSN.

[54]  Tom Schaul,et al.  A linear time natural evolution strategy for non-separable functions , 2011, GECCO.

[55]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..