High-Order Langevin Diffusion Yields an Accelerated MCMC Algorithm

We propose a Markov chain Monte Carlo (MCMC) algorithm based on third-order Langevin dynamics for sampling from distributions with log-concave and smooth densities. The higher-order dynamics allow for more flexible discretization schemes, and we develop a specific method that combines splitting with more accurate integration. For a broad class of $d$-dimensional distributions arising from generalized linear models, we prove that the resulting third-order algorithm produces samples from a distribution that is at most $\varepsilon > 0$ in Wasserstein distance from the target distribution in $O\left(\frac{d^{1/3}}{ \varepsilon^{2/3}} \right)$ steps. This result requires only Lipschitz conditions on the gradient. For general strongly convex potentials with $\alpha$-th order smoothness, we prove that the mixing time scales as $O \left(\frac{d^{1/3}}{\varepsilon^{2/3}} + \frac{d^{1/2}}{\varepsilon^{1/(\alpha - 1)}} \right)$.

[1]  M. Bartholomew-Biggs,et al.  Some effective methods for unconstrained optimization based on the solution of systems of ordinary differential equations , 1989 .

[2]  G. Stewart Afternotes goes to graduate school : lectures on advanced numerical analysis : a series of lectures on advanced numerical analysis presented at the University of Maryland at College Park and recorded after the fact , 1998 .

[3]  M. Ledoux The geometry of Markov diffusion generators , 1998 .

[4]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[5]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[6]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[7]  C. Villani Optimal Transport: Old and New , 2008 .

[8]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[9]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[10]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[11]  Assyr Abdulle,et al.  Long Time Accuracy of Lie-Trotter Splitting Methods for Langevin Dynamics , 2015, SIAM J. Numer. Anal..

[12]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[13]  Michael I. Jordan,et al.  A Lyapunov Analysis of Momentum Methods in Optimization , 2016, ArXiv.

[14]  BENEDICT LEIMKUHLER,et al.  Adaptive Thermostats for Noisy Gradient Systems , 2015, SIAM J. Sci. Comput..

[15]  Oren Mangoubi,et al.  Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions , 2017, 1708.07114.

[16]  Santosh S. Vempala,et al.  Algorithmic Theory of ODEs and Sampling from Well-conditioned Logconcave Densities , 2018, ArXiv.

[17]  Michael I. Jordan,et al.  On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo , 2018, ICML.

[18]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[19]  Peter L. Bartlett,et al.  Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[20]  Nisheeth K. Vishnoi,et al.  Dimensionally Tight Running Time Bounds for Second-Order Hamiltonian Monte Carlo , 2018, ArXiv.

[21]  Michael I. Jordan,et al.  Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[22]  Arnak S. Dalalyan,et al.  On sampling from a log-concave density using kinetic Langevin diffusions , 2018, Bernoulli.

[23]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[24]  Michael I. Jordan,et al.  Is There an Analog of Nesterov Acceleration for MCMC? , 2019, ArXiv.

[25]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[26]  Michael I. Jordan,et al.  Sampling can be faster than optimization , 2018, Proceedings of the National Academy of Sciences.

[27]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[28]  Yin Tat Lee,et al.  The Randomized Midpoint Method for Log-Concave Sampling , 2019, NeurIPS.

[29]  Lei Wu,et al.  Irreversible samplers from jump and continuous Markov processes , 2016, Stat. Comput..

[30]  A. Eberle,et al.  Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.

[31]  Martin J. Wainwright,et al.  Fast mixing of Metropolized Hamiltonian Monte Carlo: Benefits of multi-step gradients , 2019, J. Mach. Learn. Res..

[32]  Yu Cao,et al.  Complexity of randomized algorithms for underdamped Langevin dynamics , 2020, Communications in Mathematical Sciences.

[33]  Michael I. Jordan,et al.  Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.