论文信息 - Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling - 字舞流文

Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

We establish a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain. Under certain conditions on the target distribution, we prove that $\tilde O(d^4\epsilon^{-2})$ stochastic gradient evaluations suffice to guarantee $\epsilon$-sampling error in terms of the total variation distance, where $d$ is the problem dimension, which improves existing results on the convergence rate of SGLD (Raginsky et al., 2017; Xu et al., 2018). We further show that provided an additional Hessian Lipschitz condition on the log-density function, SGLD is guaranteed to achieve $\epsilon$-sampling error within $\tilde O(d^{15/4}\epsilon^{-3/2})$ stochastic gradient evaluations. Our proof technique provides a new way to study the convergence of Langevin based algorithms, and sheds some light on the design of fast stochastic gradient based sampling algorithms.

Quanquan Gu | Difan Zou | Pan Xu | Quanquan Gu | Difan Zou | Pan Xu

[1] Santosh S. Vempala,et al. Rapid Convergence of the Unadjusted Langevin Algorithm: Isoperimetry Suffices , 2019, NeurIPS.

[2] Santosh S. Vempala,et al. Convergence rate of Riemannian Hamiltonian Monte Carlo and faster polytope volume computation , 2017, STOC.

[3] Santosh S. Vempala,et al. Rapid Convergence of the Unadjusted Langevin Algorithm: Log-Sobolev Suffices , 2019, NeurIPS 2019.

[4] Jian Peng,et al. Accelerating Nonconvex Learning via Replica Exchange Langevin diffusion , 2020, ICLR.

[5] Gaël Richard,et al. Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization , 2019, ICML.

[6] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[7] Yee Whye Teh,et al. Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics , 2016, J. Mach. Learn. Res..

[8] Ohad Shamir. A Variant of Azuma's Inequality for Martingales with Subgaussian Tails , 2011, ArXiv.

[9] Martin J. Wainwright,et al. Improved bounds for discretization of Langevin diffusions: Near-optimal rates without convexity , 2019, Bernoulli.

[10] Michael I. Jordan,et al. Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[11] A. Eberle,et al. Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.

[12] E. Vanden-Eijnden,et al. Non-asymptotic mixing of the MALA algorithm , 2010, 1008.3514.

[13] Michael I. Jordan,et al. Sampling can be faster than optimization , 2018, Proceedings of the National Academy of Sciences.

[14] M. Ledoux. A simple analytic proof of an inequality by P. Buser , 1994 .

[15] Faming Liang,et al. Non-convex Learning via Replica Exchange Stochastic Gradient MCMC , 2020, ICML.

[16] Mert Gürbüzbalaban,et al. Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Stochastic Optimization: Non-Asymptotic Performance Bounds and Momentum-Based Acceleration , 2018, Oper. Res..

[17] Mert Gürbüzbalaban,et al. Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization , 2018, NIPS 2018.

[18] A. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[19] É. Moulines,et al. Sampling from a strongly log-concave distribution with the Unadjusted Langevin Algorithm , 2016 .

[20] R. Tweedie,et al. Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[21] C. Hwang,et al. Diffusion for global optimization in R n , 1987 .

[22] A. Eberle,et al. Couplings and quantitative contraction rates for Langevin dynamics , 2017, The Annals of Probability.

[23] R. Tweedie,et al. Rates of convergence of the Hastings and Metropolis algorithms , 1996 .

[24] Quanquan Gu,et al. Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction , 2019, NeurIPS.

[25] Santosh S. Vempala,et al. Eldan's Stochastic Localization and the KLS Hyperplane Conjecture: An Improved Lower Bound for Expansion , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[26] G. Parisi. Correlation functions and computer simulations (II) , 1981 .

[27] Michael I. Jordan,et al. Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[28] Santosh S. Vempala,et al. A Cubic Algorithm for Computing Gaussian Volume , 2013, SODA.

[29] S. Vempala. Geometric Random Walks: a Survey , 2007 .

[30] Arnak S. Dalalyan,et al. On sampling from a log-concave density using kinetic Langevin diffusions , 2018, Bernoulli.

[31] Yurii Nesterov,et al. Lectures on Convex Optimization , 2018 .

[32] Lawrence Carin,et al. On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[33] Hiroshi Nakagawa,et al. Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process , 2014, ICML.

[34] É. Moulines,et al. On the convergence of Hamiltonian Monte Carlo , 2017, 1705.00166.

[35] Yuchen Zhang,et al. A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[36] Tianqi Chen,et al. Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[37] Yee Whye Teh,et al. Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[38] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[39] Arnak S. Dalalyan,et al. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[40] Quanquan Gu,et al. Sampling from Non-Log-Concave Distributions via Variance-Reduced Gradient Langevin Dynamics , 2019, AISTATS.

[41] Martin J. Wainwright,et al. Fast mixing of Metropolized Hamiltonian Monte Carlo: Benefits of multi-step gradients , 2019, J. Mach. Learn. Res..

[42] Martin J. Wainwright,et al. Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[43] Lawrence Carin,et al. A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC , 2018, Science China Information Sciences.

[44] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[45] Andrej Risteski,et al. Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo , 2017, NeurIPS.

[46] R. Mazo. On the theory of brownian motion , 1973 .

[47] Nisheeth K. Vishnoi,et al. Nonconvex sampling with the Metropolis-adjusted Langevin algorithm , 2019, COLT.

[48] Jinghui Chen,et al. Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[49] Arnak S. Dalalyan,et al. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent , 2017, COLT.

[50] A. Eberle. Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[51] Miklós Simonovits,et al. The mixing rate of Markov chains, an isoperimetric inequality, and computing the volume , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[52] Ying Zhang,et al. On Stochastic Gradient Langevin Dynamics with Dependent Data Streams: The Fully Nonconvex Case , 2019, SIAM J. Math. Data Sci..

[53] Ohad Shamir,et al. Global Non-convex Optimization with Discretized Diffusions , 2018, NeurIPS.

[54] Robert L. Smith,et al. Efficient Monte Carlo Procedures for Generating Points Uniformly Distributed over Bounded Regions , 1984, Oper. Res..

[55] D. Bakry,et al. A simple proof of the Poincaré inequality for a large class of probability measures , 2008 .

[56] Odalric-Ambrym Maillard,et al. Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[57] J. D. Doll,et al. Brownian dynamics as smart Monte Carlo simulation , 1978 .

[58] Miklós Simonovits,et al. Random Walks in a Convex Body and an Improved Volume Algorithm , 1993, Random Struct. Algorithms.

[59] Xi Chen,et al. On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient Langevin Dynamics , 2019, J. Mach. Learn. Res..

[60] Quanquan Gu,et al. Laplacian Smoothing Stochastic Gradient Markov Chain Monte Carlo , 2019, SIAM J. Sci. Comput..

[61] P. Mazur. On the theory of brownian motion , 1959 .

[62] Nisheeth K. Vishnoi,et al. Dimensionally Tight Bounds for Second-Order Hamiltonian Monte Carlo , 2018, NeurIPS.

[63] Andrew Gordon Wilson,et al. Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning , 2019, ICLR.

[64] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[65] É. Moulines,et al. Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[66] Ömer Deniz Akyildiz,et al. Nonasymptotic Estimates for Stochastic Gradient Langevin Dynamics Under Local Conditions in Nonconvex Optimization , 2019, Applied Mathematics & Optimization.

[67] P. Buser. A note on the isoperimetric constant , 1982 .