Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

We establish a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain. Under certain conditions on the target distribution, we prove that $\tilde O(d^4\epsilon^{-2})$ stochastic gradient evaluations suffice to guarantee $\epsilon$-sampling error in terms of the total variation distance, where $d$ is the problem dimension, which improves existing results on the convergence rate of SGLD (Raginsky et al., 2017; Xu et al., 2018). We further show that provided an additional Hessian Lipschitz condition on the log-density function, SGLD is guaranteed to achieve $\epsilon$-sampling error within $\tilde O(d^{15/4}\epsilon^{-3/2})$ stochastic gradient evaluations. Our proof technique provides a new way to study the convergence of Langevin based algorithms, and sheds some light on the design of fast stochastic gradient based sampling algorithms.

[1]  Santosh S. Vempala,et al.  Rapid Convergence of the Unadjusted Langevin Algorithm: Isoperimetry Suffices , 2019, NeurIPS.

[2]  Santosh S. Vempala,et al.  Convergence rate of Riemannian Hamiltonian Monte Carlo and faster polytope volume computation , 2017, STOC.

[3]  Santosh S. Vempala,et al.  Rapid Convergence of the Unadjusted Langevin Algorithm: Log-Sobolev Suffices , 2019, NeurIPS 2019.

[4]  Jian Peng,et al.  Accelerating Nonconvex Learning via Replica Exchange Langevin diffusion , 2020, ICLR.

[5]  Gaël Richard,et al.  Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization , 2019, ICML.

[6]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[7]  Yee Whye Teh,et al.  Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics , 2016, J. Mach. Learn. Res..

[8]  Ohad Shamir A Variant of Azuma's Inequality for Martingales with Subgaussian Tails , 2011, ArXiv.

[9]  Martin J. Wainwright,et al.  Improved bounds for discretization of Langevin diffusions: Near-optimal rates without convexity , 2019, Bernoulli.

[10]  Michael I. Jordan,et al.  Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[11]  A. Eberle,et al.  Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.

[12]  E. Vanden-Eijnden,et al.  Non-asymptotic mixing of the MALA algorithm , 2010, 1008.3514.

[13]  Michael I. Jordan,et al.  Sampling can be faster than optimization , 2018, Proceedings of the National Academy of Sciences.

[14]  M. Ledoux A simple analytic proof of an inequality by P. Buser , 1994 .

[15]  Faming Liang,et al.  Non-convex Learning via Replica Exchange Stochastic Gradient MCMC , 2020, ICML.

[16]  Mert Gürbüzbalaban,et al.  Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Stochastic Optimization: Non-Asymptotic Performance Bounds and Momentum-Based Acceleration , 2018, Oper. Res..

[17]  Mert Gürbüzbalaban,et al.  Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization , 2018, NIPS 2018.

[18]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[19]  É. Moulines,et al.  Sampling from a strongly log-concave distribution with the Unadjusted Langevin Algorithm , 2016 .

[20]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[21]  C. Hwang,et al.  Diffusion for global optimization in R n , 1987 .

[22]  A. Eberle,et al.  Couplings and quantitative contraction rates for Langevin dynamics , 2017, The Annals of Probability.

[23]  R. Tweedie,et al.  Rates of convergence of the Hastings and Metropolis algorithms , 1996 .

[24]  Quanquan Gu,et al.  Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction , 2019, NeurIPS.

[25]  Santosh S. Vempala,et al.  Eldan's Stochastic Localization and the KLS Hyperplane Conjecture: An Improved Lower Bound for Expansion , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[26]  G. Parisi Correlation functions and computer simulations (II) , 1981 .

[27]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[28]  Santosh S. Vempala,et al.  A Cubic Algorithm for Computing Gaussian Volume , 2013, SODA.

[29]  S. Vempala Geometric Random Walks: a Survey , 2007 .

[30]  Arnak S. Dalalyan,et al.  On sampling from a log-concave density using kinetic Langevin diffusions , 2018, Bernoulli.

[31]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[32]  Lawrence Carin,et al.  On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[33]  Hiroshi Nakagawa,et al.  Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process , 2014, ICML.

[34]  É. Moulines,et al.  On the convergence of Hamiltonian Monte Carlo , 2017, 1705.00166.

[35]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[36]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[37]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[38]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[39]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[40]  Quanquan Gu,et al.  Sampling from Non-Log-Concave Distributions via Variance-Reduced Gradient Langevin Dynamics , 2019, AISTATS.

[41]  Martin J. Wainwright,et al.  Fast mixing of Metropolized Hamiltonian Monte Carlo: Benefits of multi-step gradients , 2019, J. Mach. Learn. Res..

[42]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[43]  Lawrence Carin,et al.  A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC , 2018, Science China Information Sciences.

[44]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[45]  Andrej Risteski,et al.  Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo , 2017, NeurIPS.

[46]  R. Mazo On the theory of brownian motion , 1973 .

[47]  Nisheeth K. Vishnoi,et al.  Nonconvex sampling with the Metropolis-adjusted Langevin algorithm , 2019, COLT.

[48]  Jinghui Chen,et al.  Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[49]  Arnak S. Dalalyan,et al.  Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent , 2017, COLT.

[50]  A. Eberle Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[51]  Miklós Simonovits,et al.  The mixing rate of Markov chains, an isoperimetric inequality, and computing the volume , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[52]  Ying Zhang,et al.  On Stochastic Gradient Langevin Dynamics with Dependent Data Streams: The Fully Nonconvex Case , 2019, SIAM J. Math. Data Sci..

[53]  Ohad Shamir,et al.  Global Non-convex Optimization with Discretized Diffusions , 2018, NeurIPS.

[54]  Robert L. Smith,et al.  Efficient Monte Carlo Procedures for Generating Points Uniformly Distributed over Bounded Regions , 1984, Oper. Res..

[55]  D. Bakry,et al.  A simple proof of the Poincaré inequality for a large class of probability measures , 2008 .

[56]  Odalric-Ambrym Maillard,et al.  Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[57]  J. D. Doll,et al.  Brownian dynamics as smart Monte Carlo simulation , 1978 .

[58]  Miklós Simonovits,et al.  Random Walks in a Convex Body and an Improved Volume Algorithm , 1993, Random Struct. Algorithms.

[59]  Xi Chen,et al.  On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient Langevin Dynamics , 2019, J. Mach. Learn. Res..

[60]  Quanquan Gu,et al.  Laplacian Smoothing Stochastic Gradient Markov Chain Monte Carlo , 2019, SIAM J. Sci. Comput..

[61]  P. Mazur On the theory of brownian motion , 1959 .

[62]  Nisheeth K. Vishnoi,et al.  Dimensionally Tight Bounds for Second-Order Hamiltonian Monte Carlo , 2018, NeurIPS.

[63]  Andrew Gordon Wilson,et al.  Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning , 2019, ICLR.

[64]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[65]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[66]  Ömer Deniz Akyildiz,et al.  Nonasymptotic Estimates for Stochastic Gradient Langevin Dynamics Under Local Conditions in Nonconvex Optimization , 2019, Applied Mathematics & Optimization.

[67]  P. Buser A note on the isoperimetric constant , 1982 .