Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting

We study the problem of sampling from a distribution $p^*(x) \propto \exp\left(-U(x)\right)$, where the function $U$ is $L$-smooth everywhere and $m$-strongly convex outside a ball of radius $R$, but potentially nonconvex inside this ball. We study both overdamped and underdamped Langevin MCMC and establish upper bounds on the number of steps required to obtain a sample from a distribution that is within $\epsilon$ of $p^*$ in $1$-Wasserstein distance. For the first-order method (overdamped Langevin MCMC), the iteration complexity is $\tilde{\mathcal{O}}\left(e^{cLR^2}d/\epsilon^2\right)$, where $d$ is the dimension of the underlying space. For the second-order method (underdamped Langevin MCMC), the iteration complexity is $\tilde{\mathcal{O}}\left(e^{cLR^2}\sqrt{d}/\epsilon\right)$ for an explicit positive constant $c$. Surprisingly, the iteration complexity for both these algorithms is only polynomial in the dimension $d$ and the target accuracy $\epsilon$. It is exponential, however, in the problem parameter $LR^2$, which is a measure of non-log-concavity of the target distribution.

[1]  T. H. Gronwall Note on the Derivatives with Respect to a Parameter of the Solutions of a System of Differential Equations , 1919 .

[2]  H. Kramers Brownian motion in a field of force and the diffusion model of chemical reactions , 1940 .

[3]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[4]  G. Parisi Correlation functions and computer simulations (II) , 1981 .

[5]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[6]  S. Mitter,et al.  Recursive stochastic algorithms for global optimization in R d , 1991 .

[7]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[8]  S. Shreve,et al.  Stochastic differential equations , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.

[9]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[10]  J. Silvester Determinants of block matrices , 2000, The Mathematical Gazette.

[11]  S. Dragomir Some Gronwall Type Inequalities and Applications , 2003 .

[12]  F. Hérau,et al.  Isotropic Hypoellipticity and Trend to Equilibrium for the Fokker-Planck Equation with a High-Degree Potential , 2004 .

[13]  G. Parisi Brownian motion , 2005, Nature.

[14]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[15]  Vladas Sidoravicius,et al.  Stochastic Processes and Applications , 2007 .

[16]  C. Villani Optimal Transport: Old and New , 2008 .

[17]  Arnak S. Dalalyan,et al.  Sparse Regression Learning by Aggregation and Langevin Monte-Carlo , 2009, COLT.

[18]  A. Guillin,et al.  Trend to equilibrium and particle approximation for a weakly selfconsistent Vlasov-Fokker-Planck equation , 2009, 0906.1417.

[19]  C. Mouhot,et al.  HYPOCOERCIVITY FOR LINEAR KINETIC EQUATIONS CONSERVING MASS , 2010, 1005.1495.

[20]  Simone Calogero,et al.  Exponential Convergence to Equilibrium for Kinetic Fokker-Planck Equations , 2010, 1009.5086.

[21]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[22]  Peter L. Bartlett,et al.  Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions , 2013, NIPS.

[23]  S. Mischler,et al.  Exponential Stability of Slowly Decaying Solutions to the Kinetic-Fokker-Planck Equation , 2014, Archive for Rational Mechanics and Analysis.

[24]  M. Betancourt,et al.  The Geometric Foundations of Hamiltonian Monte Carlo , 2014, 1410.5110.

[25]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[26]  G. Pavliotis Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations , 2014 .

[27]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[28]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[29]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[30]  Lester W. Mackey,et al.  Measuring Sample Quality with Diffusions , 2016, The Annals of Applied Probability.

[31]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[32]  Fabrice Baudoin Wasserstein contraction properties for hypoelliptic diffusions , 2016, 1602.04177.

[33]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[34]  A. Eberle Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[35]  É. Moulines,et al.  Sampling from a strongly log-concave distribution with the Unadjusted Langevin Algorithm , 2016 .

[36]  M. Betancourt,et al.  The Geometric Foundations of Hamiltonian Monte Carlo , 2014, 1410.5110.

[37]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[38]  Oren Mangoubi,et al.  Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions , 2017, 1708.07114.

[39]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[40]  Santosh S. Vempala,et al.  Convergence rate of Riemannian Hamiltonian Monte Carlo and faster polytope volume computation , 2017, STOC.

[41]  Nisheeth K. Vishnoi,et al.  Dimensionally Tight Bounds for Second-Order Hamiltonian Monte Carlo , 2018, NeurIPS.

[42]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[43]  Peter L. Bartlett,et al.  Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[44]  Nisheeth K. Vishnoi,et al.  Dimensionally Tight Running Time Bounds for Second-Order Hamiltonian Monte Carlo , 2018, ArXiv.

[45]  Jinghui Chen,et al.  Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[46]  Andrej Risteski,et al.  Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo , 2017, NeurIPS.

[47]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[48]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[49]  A. Eberle,et al.  Couplings and quantitative contraction rates for Langevin dynamics , 2017, The Annals of Probability.

[50]  Mateusz B. Majka,et al.  Quantitative contraction rates for Markov chains on general state spaces , 2018, Electronic Journal of Probability.

[51]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[52]  A. Eberle,et al.  Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.

[53]  A. Doucet,et al.  Randomized Hamiltonian Monte Carlo as scaling limit of the bouncy particle sampler and dimension-free convergence rates , 2018, The Annals of Applied Probability.