Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

We provide a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain. Under certain conditions on the target distribution, we prove that $\tilde O(d^4\epsilon^{-2})$ stochastic gradient evaluations suffice to guarantee $\epsilon$-sampling error in terms of the total variation distance, where $d$ is the problem dimension. This improves existing results on the convergence rate of SGLD (Raginsky et al., 2017; Xu et al., 2018). We further show that provided an additional Hessian Lipschitz condition on the log-density function, SGLD is guaranteed to achieve $\epsilon$-sampling error within $\tilde O(d^{15/4}\epsilon^{-3/2})$ stochastic gradient evaluations. Our proof technique provides a new way to study the convergence of Langevin-based algorithms and sheds some light on the design of fast stochastic gradient-based sampling algorithms.

[1]  Santosh S. Vempala,et al.  Convergence rate of Riemannian Hamiltonian Monte Carlo and faster polytope volume computation , 2017, STOC.

[2]  Arnak S. Dalalyan,et al.  Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent , 2017, COLT.

[3]  Quanquan Gu,et al.  Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction , 2019, NeurIPS.

[4]  Santosh S. Vempala,et al.  A Cubic Algorithm for Computing Gaussian Volume , 2013, SODA.

[5]  Odalric-Ambrym Maillard,et al.  Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[6]  Yuansi Chen An Almost Constant Lower Bound of the Isoperimetric Coefficient in the KLS Conjecture , 2020, 2011.13661.

[7]  Nisheeth K. Vishnoi,et al.  Dimensionally Tight Bounds for Second-Order Hamiltonian Monte Carlo , 2018, NeurIPS.

[8]  Santosh S. Vempala,et al.  Rapid Convergence of the Unadjusted Langevin Algorithm: Log-Sobolev Suffices , 2019, NeurIPS 2019.

[9]  Ying Zhang,et al.  On Stochastic Gradient Langevin Dynamics with Dependent Data Streams: The Fully Nonconvex Case , 2019, SIAM J. Math. Data Sci..

[10]  Ohad Shamir A Variant of Azuma's Inequality for Martingales with Subgaussian Tails , 2011, ArXiv.

[11]  Miklós Simonovits,et al.  Isoperimetric problems for convex bodies and a localization lemma , 1995, Discret. Comput. Geom..

[12]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[13]  A. Eberle Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[16]  Andrej Risteski,et al.  Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo , 2017, NeurIPS.

[17]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[18]  R. Tweedie,et al.  Rates of convergence of the Hastings and Metropolis algorithms , 1996 .

[19]  É. Moulines,et al.  On the convergence of Hamiltonian Monte Carlo , 2017, 1705.00166.

[20]  Faming Liang,et al.  Non-convex Learning via Replica Exchange Stochastic Gradient MCMC , 2020, ICML.

[21]  É. Moulines,et al.  Sampling from a strongly log-concave distribution with the Unadjusted Langevin Algorithm , 2016 .

[22]  Mert Gürbüzbalaban,et al.  Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization , 2018, NIPS 2018.

[23]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[24]  Andrew Gordon Wilson,et al.  Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning , 2019, ICLR.

[25]  Miklós Simonovits,et al.  The mixing rate of Markov chains, an isoperimetric inequality, and computing the volume , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[26]  R. Mazo On the theory of brownian motion , 1973 .

[27]  Lawrence Carin,et al.  On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[28]  Quanquan Gu,et al.  Laplacian Smoothing Stochastic Gradient Markov Chain Monte Carlo , 2019, SIAM J. Sci. Comput..

[29]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[30]  Ohad Shamir,et al.  Global Non-convex Optimization with Discretized Diffusions , 2018, NeurIPS.

[31]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[32]  Xi Chen,et al.  On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient Langevin Dynamics , 2019, J. Mach. Learn. Res..

[33]  Gaël Richard,et al.  Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization , 2019, ICML.

[34]  Martin J. Wainwright,et al.  Improved bounds for discretization of Langevin diffusions: Near-optimal rates without convexity , 2019, Bernoulli.

[35]  A. Eberle,et al.  Couplings and quantitative contraction rates for Langevin dynamics , 2017, The Annals of Probability.

[36]  A. Eberle,et al.  Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.

[37]  Lawrence Carin,et al.  A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC , 2018, Science China Information Sciences.

[38]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[39]  Jinghui Chen,et al.  Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[40]  S. Vempala Geometric Random Walks: a Survey , 2007 .

[41]  Yee Whye Teh,et al.  Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics , 2016, J. Mach. Learn. Res..

[42]  M. Ledoux A simple analytic proof of an inequality by P. Buser , 1994 .

[43]  Nisheeth K. Vishnoi,et al.  Nonconvex sampling with the Metropolis-adjusted Langevin algorithm , 2019, COLT.

[44]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[45]  Santosh S. Vempala,et al.  Eldan's Stochastic Localization and the KLS Hyperplane Conjecture: An Improved Lower Bound for Expansion , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[46]  Quanquan Gu,et al.  Sampling from Non-Log-Concave Distributions via Variance-Reduced Gradient Langevin Dynamics , 2019, AISTATS.

[47]  Miklós Simonovits,et al.  Random Walks in a Convex Body and an Improved Volume Algorithm , 1993, Random Struct. Algorithms.

[48]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[49]  Michael I. Jordan,et al.  Sampling can be faster than optimization , 2018, Proceedings of the National Academy of Sciences.

[50]  Martin J. Wainwright,et al.  Fast mixing of Metropolized Hamiltonian Monte Carlo: Benefits of multi-step gradients , 2019, J. Mach. Learn. Res..

[51]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[52]  C. Hwang,et al.  Diffusion for global optimization in R n , 1987 .

[53]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[54]  Jian Peng,et al.  Accelerating Nonconvex Learning via Replica Exchange Langevin diffusion , 2020, ICLR.

[55]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[56]  Robert L. Smith,et al.  Efficient Monte Carlo Procedures for Generating Points Uniformly Distributed over Bounded Regions , 1984, Oper. Res..

[57]  E. Vanden-Eijnden,et al.  Non-asymptotic mixing of the MALA algorithm , 2010, 1008.3514.

[58]  Mert Gürbüzbalaban,et al.  Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Stochastic Optimization: Non-Asymptotic Performance Bounds and Momentum-Based Acceleration , 2018, Oper. Res..

[59]  Michael I. Jordan,et al.  Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[60]  Arnak S. Dalalyan,et al.  On sampling from a log-concave density using kinetic Langevin diffusions , 2018, Bernoulli.

[61]  Ömer Deniz Akyildiz,et al.  Nonasymptotic Estimates for Stochastic Gradient Langevin Dynamics Under Local Conditions in Nonconvex Optimization , 2019, Applied Mathematics & Optimization.

[62]  D. Bakry,et al.  A simple proof of the Poincaré inequality for a large class of probability measures , 2008 .

[63]  P. Buser A note on the isoperimetric constant , 1982 .

[64]  Hiroshi Nakagawa,et al.  Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process , 2014, ICML.

[65]  J. D. Doll,et al.  Brownian dynamics as smart Monte Carlo simulation , 1978 .