An ODE method to prove the geometric convergence of adaptive stochastic algorithms

We develop a methodology to prove geometric convergence of the parameter sequence $\{\theta_n\}_{n\geq 0}$ of a stochastic algorithm. The convergence is measured via a function $\Psi$ that is similar to a Lyapunov function. Important algorithms that motivate the introduction of this methodology are stochastic algorithms deriving from optimization methods solving deterministic optimization problems. Among them, we are especially interested in analyzing comparison-based algorithms that typically derive from stochastic approximation algorithms with a constant step-size. We employ the so-called ODE method that relates a stochastic algorithm to its mean ODE, along with the Lyapunov-like function $\Psi$ such that the geometric convergence of $\Psi(\theta_n)$ implies---in the case of a stochastic optimization algorithm---the geometric convergence of the expected distance between the optimum of the optimization problem and the search point generated by the algorithm. We provide two sufficient conditions such that $\Psi(\theta_n)$ decreases at a geometric rate. First, $\Psi$ should decrease "exponentially" along the solution to the mean ODE. Second, the deviation between the stochastic algorithm and the ODE solution (measured with the function $\Psi$) should be bounded by $\Psi(\theta_n)$ times a constant. We provide in addition practical conditions that allow to verify easily the two sufficient conditions without knowing in particular the solution of the mean ODE. Our results are any-time bounds on $\Psi(\theta_n)$, so we can deduce not only asymptotic upper bound on the convergence rate, but also the first hitting time of the algorithm. The main results are applied to two comparison-based stochastic algorithms with a constant step-size for optimization on discrete and continuous domains.

[1]  R. Jackson Inequalities , 2007, Algebra for Parents.

[2]  M. Benaïm A Dynamical System Approach to Stochastic Approximations , 1996 .

[3]  Gang Niu,et al.  Analysis and Improvement of Policy Gradient Estimation , 2011, NIPS.

[4]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[5]  Anne Auger,et al.  Evolution Strategies , 2018, Handbook of Computational Intelligence.

[6]  Shalabh Bhatnagar,et al.  Gradient-Based Adaptive Stochastic Search for Simulation Optimization Over Continuous Space , 2018, INFORMS J. Comput..

[7]  Youhei Akimoto,et al.  Convergence rate of the (1+1)-evolution strategy with success-based step-size adaptation on convex quadratic functions , 2021, GECCO.

[8]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[9]  Martin J. Wainwright,et al.  Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization , 2010, IEEE Transactions on Information Theory.

[10]  L. Ljung Strong Convergence of a Stochastic Approximation Algorithm , 1978 .

[11]  V. Borkar Stochastic Approximation: A Dynamical Systems Viewpoint , 2008 .

[12]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[13]  H. Schwefel,et al.  Establishing connections between evolutionary algorithms and stochastic approximation , 1995 .

[14]  Olivier Teytaud,et al.  General Lower Bounds for Evolutionary Algorithms , 2006, PPSN.

[15]  Anne Auger,et al.  Quality gain analysis of the weighted recombination evolution strategy on general convex quadratic functions , 2020, Theor. Comput. Sci..

[16]  Associazione per la matematica applicata alle scienze economich sociali Rivista di matematica per le scienze economiche e sociali , 1978 .

[17]  Giorgio Giorgi,et al.  Dini derivatives in optimization — Part I , 1992 .

[18]  B. Hajek Hitting-time and occupation-time bounds implied by drift analysis with applications , 1982, Advances in Applied Probability.

[19]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[20]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[21]  Tobias Glasmachers Convergence of the IGO-Flow of Isotropic Gaussian Distributions on Convex Quadratic Problems , 2012, PPSN.

[22]  Han-Fu Chen,et al.  Stability and instability of limit points for stochastic approximation algorithms , 2000, IEEE Trans. Autom. Control..

[23]  Youhei Akimoto,et al.  Convergence of the Continuous Time Trajectories of Isotropic Evolution Strategies on Monotonic C^2-composite Functions , 2012 .

[24]  Lennart Ljung,et al.  Analysis of recursive stochastic algorithms , 1977 .

[25]  Dean S. Clark,et al.  Short proof of a discrete gronwall inequality , 1987, Discret. Appl. Math..

[26]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[27]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[28]  H. Schwefel,et al.  Analyzing (1; ) Evolution Strategy via Stochastic Approximation Methods , 1995 .

[29]  R. Pemantle,et al.  Nonconvergence to Unstable Points in Urn Models and Stochastic Approximations , 1990 .

[30]  Rich Caruana,et al.  Removing the Genetics from the Standard Genetic Algorithm , 1995, ICML.

[31]  L. Gerencsér,et al.  The mathematics of noise-free SPSA , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).

[32]  P. Olver Nonlinear Systems , 2013 .

[33]  Anne Auger,et al.  Information-Geometric Optimization Algorithms: A Unifying Picture via Invariance Principles , 2011, J. Mach. Learn. Res..

[34]  Anne Auger,et al.  Drift theory in continuous search spaces: expected hitting time of the (1 + 1)-ES with 1/5 success rule , 2018, GECCO.

[35]  Youhei Akimoto,et al.  Generalized drift analysis in continuous domain: linear convergence of (1 + 1)-ES on strongly convex functions with Lipschitz continuous gradients , 2019, FOGA '19.

[36]  H. Thorisson Coupling, stationarity, and regeneration , 2000 .

[37]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[38]  P. Alam ‘S’ , 2021, Composites Engineering: An A–Z Guide.

[39]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[40]  Mark W. Schmidt,et al.  A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[41]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[42]  Harold J. Kushner,et al.  Rate of Convergence for Constrained Stochastic Approximation Algorithms , 2001, SIAM J. Control. Optim..

[43]  Anne Auger,et al.  Principled Design of Continuous Stochastic Search: From Theory to Practice , 2014, Theory and Principled Methods for the Design of Metaheuristics.

[44]  Youhei Akimoto,et al.  Diagonal Acceleration for Covariance Matrix Adaptation Evolution Strategies , 2019, Evolutionary Computation.

[45]  Odile Brandière,et al.  Some Pathological Traps for Stochastic Approximation , 1998 .

[46]  N. Rouche,et al.  Stability Theory by Liapunov's Direct Method , 1977 .

[47]  Bonnie Berger,et al.  The fourth moment method , 1991, SODA '91.

[48]  James C. Spall,et al.  Adaptive stochastic approximation by the simultaneous perturbation method , 2000, IEEE Trans. Autom. Control..

[49]  Josef Hofbauer,et al.  Stochastic Approximations and Differential Inclusions, Part II: Applications , 2006, Math. Oper. Res..

[50]  G. Fitzgerald,et al.  'I. , 2019, Australian journal of primary health.

[51]  Sean P. Meyn,et al.  The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning , 2000, SIAM J. Control. Optim..

[52]  Josef Hofbauer,et al.  Stochastic Approximations and Differential Inclusions , 2005, SIAM J. Control. Optim..

[53]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[54]  P. Alam ‘E’ , 2021, Composites Engineering: An A–Z Guide.

[55]  Günter Rudolph,et al.  Analyzing the (1, ) Evolution Strategy via Stochastic Approximation Methods , 1995, Evolutionary Computation.

[56]  J. Spall Adaptive stochastic approximation by the simultaneous perturbation method , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).

[57]  T. Sideris Ordinary Differential Equations and Dynamical Systems , 2013 .