论文信息 - Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling - 字舞流文

Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

We provide a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain. Under certain conditions on the target distribution, we prove that $\tilde O(d^4\epsilon^{-2})$ stochastic gradient evaluations suffice to guarantee $\epsilon$-sampling error in terms of the total variation distance, where $d$ is the problem dimension. This improves existing results on the convergence rate of SGLD (Raginsky et al., 2017; Xu et al., 2018). We further show that provided an additional Hessian Lipschitz condition on the log-density function, SGLD is guaranteed to achieve $\epsilon$-sampling error within $\tilde O(d^{15/4}\epsilon^{-3/2})$ stochastic gradient evaluations. Our proof technique provides a new way to study the convergence of Langevin-based algorithms and sheds some light on the design of fast stochastic gradient-based sampling algorithms.

Quanquan Gu | Difan Zou | Pan Xu

[1] Santosh S. Vempala,et al. Convergence rate of Riemannian Hamiltonian Monte Carlo and faster polytope volume computation , 2017, STOC.

[2] Arnak S. Dalalyan,et al. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent , 2017, COLT.

[3] Quanquan Gu,et al. Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction , 2019, NeurIPS.

[4] Santosh S. Vempala,et al. A Cubic Algorithm for Computing Gaussian Volume , 2013, SODA.

[5] Odalric-Ambrym Maillard,et al. Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[6] Yuansi Chen. An Almost Constant Lower Bound of the Isoperimetric Coefficient in the KLS Conjecture , 2020, 2011.13661.

[7] Nisheeth K. Vishnoi,et al. Dimensionally Tight Bounds for Second-Order Hamiltonian Monte Carlo , 2018, NeurIPS.

[8] Santosh S. Vempala,et al. Rapid Convergence of the Unadjusted Langevin Algorithm: Log-Sobolev Suffices , 2019, NeurIPS 2019.

[9] Ying Zhang,et al. On Stochastic Gradient Langevin Dynamics with Dependent Data Streams: The Fully Nonconvex Case , 2019, SIAM J. Math. Data Sci..

[10] Ohad Shamir. A Variant of Azuma's Inequality for Martingales with Subgaussian Tails , 2011, ArXiv.

[11] Miklós Simonovits,et al. Isoperimetric problems for convex bodies and a localization lemma , 1995, Discret. Comput. Geom..

[12] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[13] A. Eberle. Couplings, distances and contractivity for diffusion processes revisited , 2013 .

[14] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[15] Michael I. Jordan,et al. Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[16] Andrej Risteski,et al. Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo , 2017, NeurIPS.

[17] Yuchen Zhang,et al. A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[18] R. Tweedie,et al. Rates of convergence of the Hastings and Metropolis algorithms , 1996 .

[19] É. Moulines,et al. On the convergence of Hamiltonian Monte Carlo , 2017, 1705.00166.

[20] Faming Liang,et al. Non-convex Learning via Replica Exchange Stochastic Gradient MCMC , 2020, ICML.

[21] É. Moulines,et al. Sampling from a strongly log-concave distribution with the Unadjusted Langevin Algorithm , 2016 .

[22] Mert Gürbüzbalaban,et al. Breaking Reversibility Accelerates Langevin Dynamics for Global Non-Convex Optimization , 2018, NIPS 2018.

[23] Arnak S. Dalalyan,et al. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[24] Andrew Gordon Wilson,et al. Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning , 2019, ICLR.

[25] Miklós Simonovits,et al. The mixing rate of Markov chains, an isoperimetric inequality, and computing the volume , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[26] R. Mazo. On the theory of brownian motion , 1973 .

[27] Lawrence Carin,et al. On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[28] Quanquan Gu,et al. Laplacian Smoothing Stochastic Gradient Markov Chain Monte Carlo , 2019, SIAM J. Sci. Comput..

[29] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[30] Ohad Shamir,et al. Global Non-convex Optimization with Discretized Diffusions , 2018, NeurIPS.

[31] R. Tweedie,et al. Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[32] Xi Chen,et al. On Stationary-Point Hitting Time and Ergodicity of Stochastic Gradient Langevin Dynamics , 2019, J. Mach. Learn. Res..

[33] Gaël Richard,et al. Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization , 2019, ICML.

[34] Martin J. Wainwright,et al. Improved bounds for discretization of Langevin diffusions: Near-optimal rates without convexity , 2019, Bernoulli.

[35] A. Eberle,et al. Couplings and quantitative contraction rates for Langevin dynamics , 2017, The Annals of Probability.

[36] A. Eberle,et al. Coupling and convergence for Hamiltonian Monte Carlo , 2018, The Annals of Applied Probability.

[37] Lawrence Carin,et al. A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC , 2018, Science China Information Sciences.

[38] Tianqi Chen,et al. Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[39] Jinghui Chen,et al. Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[40] S. Vempala. Geometric Random Walks: a Survey , 2007 .

[41] Yee Whye Teh,et al. Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics , 2016, J. Mach. Learn. Res..

[42] M. Ledoux. A simple analytic proof of an inequality by P. Buser , 1994 .

[43] Nisheeth K. Vishnoi,et al. Nonconvex sampling with the Metropolis-adjusted Langevin algorithm , 2019, COLT.

[44] Yee Whye Teh,et al. Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[45] Santosh S. Vempala,et al. Eldan's Stochastic Localization and the KLS Hyperplane Conjecture: An Improved Lower Bound for Expansion , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[46] Quanquan Gu,et al. Sampling from Non-Log-Concave Distributions via Variance-Reduced Gradient Langevin Dynamics , 2019, AISTATS.

[47] Miklós Simonovits,et al. Random Walks in a Convex Body and an Improved Volume Algorithm , 1993, Random Struct. Algorithms.

[48] A. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[49] Michael I. Jordan,et al. Sampling can be faster than optimization , 2018, Proceedings of the National Academy of Sciences.

[50] Martin J. Wainwright,et al. Fast mixing of Metropolized Hamiltonian Monte Carlo: Benefits of multi-step gradients , 2019, J. Mach. Learn. Res..

[51] É. Moulines,et al. Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[52] C. Hwang,et al. Diffusion for global optimization in R n , 1987 .

[53] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[54] Jian Peng,et al. Accelerating Nonconvex Learning via Replica Exchange Langevin diffusion , 2020, ICLR.

[55] Martin J. Wainwright,et al. Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[56] Robert L. Smith,et al. Efficient Monte Carlo Procedures for Generating Points Uniformly Distributed over Bounded Regions , 1984, Oper. Res..

[57] E. Vanden-Eijnden,et al. Non-asymptotic mixing of the MALA algorithm , 2010, 1008.3514.

[58] Mert Gürbüzbalaban,et al. Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Stochastic Optimization: Non-Asymptotic Performance Bounds and Momentum-Based Acceleration , 2018, Oper. Res..

[59] Michael I. Jordan,et al. Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting , 2018, ArXiv.

[60] Arnak S. Dalalyan,et al. On sampling from a log-concave density using kinetic Langevin diffusions , 2018, Bernoulli.

[61] Ömer Deniz Akyildiz,et al. Nonasymptotic Estimates for Stochastic Gradient Langevin Dynamics Under Local Conditions in Nonconvex Optimization , 2019, Applied Mathematics & Optimization.

[62] D. Bakry,et al. A simple proof of the Poincaré inequality for a large class of probability measures , 2008 .

[63] P. Buser. A note on the isoperimetric constant , 1982 .

[64] Hiroshi Nakagawa,et al. Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process , 2014, ICML.

[65] J. D. Doll,et al. Brownian dynamics as smart Monte Carlo simulation , 1978 .