Convergence Rates for Langevin Monte Carlo in the Nonconvex Setting

We study the problem of sampling from a distribution p∗(x) ∝ exp (−U(x)), where the function U is L-smooth everywhere and m-strongly convex outside a ball of radius R, but potentially nonconvex inside this ball. We study both overdamped and underdamped Langevin MCMC and establish upper bounds on the number of steps required to obtain a sample from a distribution that is within ε of p∗ in 1-Wasserstein distance. For the first-order method (overdamped Langevin MCMC), the iteration complexity is Õ ( e 2 d/ε ) , where d is the dimension of the underlying space. For the second-order method (underdamped Langevin MCMC), the iteration complexity is Õ ( e 2√ d/ε ) for an explicit positive constant c. Surprisingly, the iteration complexity for both these algorithms is only polynomial in the dimension d and the target accuracy ε. It is exponential, however, in the problem parameter LR, which is a measure of non-log-concavity of the target distribution.

[1]  A. Doucet,et al.  Randomized Hamiltonian Monte Carlo as scaling limit of the bouncy particle sampler and dimension-free convergence rates , 2018, The Annals of Applied Probability.

[2]  V. Nagarajan Lecture Notes: Introduction to Online Optimization , 2021 .

[3]  A. Eberle,et al.  Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.

[4]  Mateusz B. Majka,et al.  Quantitative contraction rates for Markov chains on general state spaces , 2018, Electronic Journal of Probability.

[5]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[6]  A. Eberle,et al.  Couplings and quantitative contraction rates for Langevin dynamics , 2017, The Annals of Probability.

[7]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[8]  Nisheeth K. Vishnoi,et al.  Dimensionally Tight Running Time Bounds for Second-Order Hamiltonian Monte Carlo , 2018, ArXiv.

[9]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[10]  Santosh S. Vempala,et al.  Convergence rate of Riemannian Hamiltonian Monte Carlo and faster polytope volume computation , 2017, STOC.

[11]  Andrej Risteski,et al.  Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo , 2017, NeurIPS.

[12]  Jinghui Chen,et al.  Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[13]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[14]  Peter L. Bartlett,et al.  Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[15]  Oren Mangoubi,et al.  Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions , 2017, 1708.07114.

[16]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[17]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[18]  M. Betancourt,et al.  The Geometric Foundations of Hamiltonian Monte Carlo , 2014, 1410.5110.

[19]  É. Moulines,et al.  Sampling from a strongly log-concave distribution with the Unadjusted Langevin Algorithm , 2016 .

[20]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[21]  Fabrice Baudoin Wasserstein contraction properties for hypoelliptic diffusions , 2016, 1602.04177.

[22]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[23]  A. Eberle Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[24]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[25]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[26]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[27]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[28]  S. Mischler,et al.  Exponential Stability of Slowly Decaying Solutions to the Kinetic-Fokker-Planck Equation , 2014, Archive for Rational Mechanics and Analysis.

[29]  M. Betancourt,et al.  The Geometric Foundations of Hamiltonian Monte Carlo , 2014, 1410.5110.

[30]  Peter L. Bartlett,et al.  Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions , 2013, NIPS.

[31]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[32]  Simone Calogero,et al.  Exponential Convergence to Equilibrium for Kinetic Fokker-Planck Equations , 2010, 1009.5086.

[33]  C. Mouhot,et al.  HYPOCOERCIVITY FOR LINEAR KINETIC EQUATIONS CONSERVING MASS , 2010, 1005.1495.

[34]  A. Guillin,et al.  Trend to equilibrium and particle approximation for a weakly selfconsistent Vlasov-Fokker-Planck equation , 2009, 0906.1417.

[35]  Arnak S. Dalalyan,et al.  Sparse Regression Learning by Aggregation and Langevin Monte-Carlo , 2009, COLT.

[36]  C. Villani Optimal Transport: Old and New , 2008 .

[37]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[38]  G. Parisi Brownian motion , 2005, Nature.

[39]  F. Hérau,et al.  Isotropic Hypoellipticity and Trend to Equilibrium for the Fokker-Planck Equation with a High-Degree Potential , 2004 .

[40]  S. Dragomir Some Gronwall Type Inequalities and Applications , 2003 .

[41]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[42]  J. Silvester Determinants of block matrices , 2000, The Mathematical Gazette.

[43]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[44]  S. Mitter,et al.  Recursive stochastic algorithms for global optimization in R d , 1991 .

[45]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[46]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[47]  H. Kramers Brownian motion in a field of force and the diffusion model of chemical reactions , 1940 .