Is there an analog of Nesterov acceleration for gradient-based MCMC?

We formulate gradient-based Markov chain Monte Carlo (MCMC) sampling as optimization on the space of probability measures, with Kullback–Leibler (KL) divergence as the objective functional. We show that an underdamped form of the Langevin algorithm performs accelerated gradient descent in this metric. To characterize the convergence of the algorithm, we construct a Lyapunov functional and exploit hypocoercivity of the underdamped Langevin algorithm. As an application, we show that accelerated rates can be obtained for a class of nonconvex functions with the Langevin algorithm.

[1]  Nisheeth K. Vishnoi,et al.  Dimensionally Tight Running Time Bounds for Second-Order Hamiltonian Monte Carlo , 2018, ArXiv.

[2]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[3]  Ohad Shamir,et al.  Global Non-convex Optimization with Discretized Diffusions , 2018, NeurIPS.

[4]  A. Doucet,et al.  The Bouncy Particle Sampler: A Nonreversible Rejection-Free Markov Chain Monte Carlo Method , 2015, 1510.02451.

[5]  M. Ledoux,et al.  Logarithmic Sobolev Inequalities , 2014 .

[6]  R. Jack,et al.  Acceleration of Convergence to Equilibrium in Markov Chains by Breaking Detailed Balance , 2016, Journal of statistical physics.

[7]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[8]  G. Roberts,et al.  Langevin Diffusions and Metropolis-Hastings Algorithms , 2002 .

[9]  Peter L. Bartlett,et al.  Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[10]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[11]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[12]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[13]  D. Kinderlehrer,et al.  THE VARIATIONAL FORMULATION OF THE FOKKER-PLANCK EQUATION , 1996 .

[14]  Lei Wu,et al.  Irreversible samplers from jump and continuous Markov processes , 2016, Stat. Comput..

[15]  Martin J. Wainwright,et al.  High-Order Langevin Diffusion Yields an Accelerated MCMC Algorithm , 2019, J. Mach. Learn. Res..

[16]  Espen Bernton,et al.  Langevin Monte Carlo and JKO splitting , 2018, COLT.

[17]  Mert Gürbüzbalaban,et al.  Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization , 2018, NIPS 2018.

[18]  Michael I. Jordan,et al.  Acceleration via Symplectic Discretization of High-Resolution Differential Equations , 2019, NeurIPS.

[19]  J. Doll,et al.  Brownian dynamics as smart Monte Carlo simulation , 1978 .

[20]  P. Fearnhead,et al.  The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data , 2016, The Annals of Statistics.

[21]  C. Villani Optimal Transport: Old and New , 2008 .

[22]  H. Peters,et al.  Convex functions on non-convex domains , 1986 .

[23]  Masayuki Ohzeki,et al.  Conflict between fastest relaxation of a Markov process and detailed balance condition. , 2016, Physical review. E.

[24]  Santosh S. Vempala,et al.  Rapid Convergence of the Unadjusted Langevin Algorithm: Log-Sobolev Suffices , 2019, NeurIPS 2019.

[25]  Michael I. Jordan,et al.  A Lyapunov Analysis of Momentum Methods in Optimization , 2016, ArXiv.

[26]  BENEDICT LEIMKUHLER,et al.  Adaptive Thermostats for Noisy Gradient Systems , 2015, SIAM J. Sci. Comput..

[27]  Arnak S. Dalalyan,et al.  On sampling from a log-concave density using kinetic Langevin diffusions , 2018, Bernoulli.

[28]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[29]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[30]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[31]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[32]  R. Mazo On the theory of brownian motion , 1973 .

[33]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[34]  Andre Wibisono,et al.  Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem , 2018, COLT.

[35]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[36]  Oren Mangoubi,et al.  Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions , 2017, 1708.07114.

[37]  Masayuki Ohzeki,et al.  Langevin dynamics neglecting detailed balance condition. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  M. H. Duong,et al.  Conservative‐dissipative approximation schemes for a generalized Kramers equation , 2012, 1206.2859.

[39]  Yang Song,et al.  Stochastic Gradient Geodesic MCMC Methods , 2016, NIPS.

[40]  Emmanuel J. Candès,et al.  Adaptive Restart for Accelerated Gradient Schemes , 2012, Foundations of Computational Mathematics.

[41]  Simone Calogero,et al.  Exponential Convergence to Equilibrium for Kinetic Fokker-Planck Equations , 2010, 1009.5086.

[42]  A. Eberle,et al.  Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.

[43]  Yair Carmon,et al.  "Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions , 2017, ICML.

[44]  Alain Durmus,et al.  Analysis of Langevin Monte Carlo via Convex Optimization , 2018, J. Mach. Learn. Res..

[45]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[46]  C. Villani,et al.  Generalization of an Inequality by Talagrand and Links with the Logarithmic Sobolev Inequality , 2000 .

[47]  Murat A. Erdogdu,et al.  Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond , 2019, NeurIPS.

[48]  Konstantinos Spiliopoulos,et al.  Improving the Convergence of Reversible Samplers , 2016 .

[49]  Wuchen Li,et al.  Accelerated Information Gradient flow , 2022, J. Sci. Comput..

[50]  L. Ambrosio,et al.  Gradient Flows: In Metric Spaces and in the Space of Probability Measures , 2005 .

[51]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[52]  Aryan Mokhtari,et al.  Direct Runge-Kutta Discretization Achieves Acceleration , 2018, NeurIPS.

[53]  M. Yan Extension of Convex Function , 2012, 1207.0944.

[54]  Amirhossein Taghvaei,et al.  Accelerated Flow for Probability Distributions , 2019, ICML.

[55]  Michael I. Jordan,et al.  On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo , 2018, ICML.

[56]  Chao-Cheng Huang A variational principle for the Kramers equation with unbounded external forces , 2000 .

[57]  Michael I. Jordan,et al.  Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[58]  N. Pillai,et al.  A Function Space HMC Algorithm With Second Order Langevin Diffusion Limit , 2013, 1308.0543.

[59]  Michael I. Jordan,et al.  Sampling can be faster than optimization , 2018, Proceedings of the National Academy of Sciences.