Sampling from Non-Log-Concave Distributions via Stochastic Variance-Reduced Gradient Langevin Dynamics

We study stochastic variance reduction-based Langevin dynamic algorithms, SVRG-LD and SAGA-LD (Dubey et al., 2016), for sampling from non-log-concave distributions. Under certain assumptions on the log density function, we establish the convergence guarantees of SVRG-LD and SAGA-LD in 2Wasserstein distance. More specifically, we show that both SVRG-LD and SAGA-LD require Õ ( n+n/ +n/ 4 ) ·exp ( Õ(d+γ) ) stochastic gradient evaluations to achieve accuracy in 2-Wasserstein distance, which outperforms the Õ ( n/ 4 ) · exp ( Õ(d + γ) ) gradient complexity achieved by Langevin Monte Carlo Method (Raginsky et al., 2017). Experiments on synthetic data and real data back up our theory.

[1]  Jian Peng,et al.  Accelerating Nonconvex Learning via Replica Exchange Langevin diffusion , 2020, ICLR.

[2]  Jian Li,et al.  Stochastic gradient Hamiltonian Monte Carlo with variance reduction for Bayesian inference , 2018, Machine Learning.

[3]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[4]  Robert Kohn,et al.  Hamiltonian Monte Carlo with Energy Conserving Subsampling , 2017, J. Mach. Learn. Res..

[5]  Christopher Nemeth,et al.  Control variates for stochastic gradient MCMC , 2017, Statistics and Computing.

[6]  Quanquan Gu,et al.  Stochastic Variance-Reduced Hamilton Monte Carlo Methods , 2018, ICML.

[7]  Martin J. Wainwright,et al.  Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[8]  Andrej Risteski,et al.  Beyond Log-concavity: Provable Guarantees for Sampling Multi-modal Distributions using Simulated Tempering Langevin Monte Carlo , 2017, NeurIPS.

[9]  Lawrence Carin,et al.  A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC , 2018, Science China Information Sciences.

[10]  Jinghui Chen,et al.  Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[11]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[12]  Sébastien Bubeck,et al.  Sampling from a Log-Concave Distribution with Projected Langevin Monte Carlo , 2015, Discrete & Computational Geometry.

[13]  Quanquan Gu,et al.  Subsampled Stochastic Variance-Reduced Gradient Langevin Dynamics , 2018, UAI.

[14]  Eric Moulines,et al.  Sampling from a log-concave distribution with compact support with proximal Langevin Monte Carlo , 2017, COLT.

[15]  Arnak S. Dalalyan,et al.  Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent , 2017, COLT.

[16]  Zhanxing Zhu,et al.  Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks , 2017, NIPS.

[17]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[18]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[19]  Stefano Soatto,et al.  Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.

[20]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[21]  Alexander J. Smola,et al.  Variance Reduction in Stochastic Gradient Langevin Dynamics , 2016, NIPS.

[22]  É. Moulines,et al.  Sampling from a strongly log-concave distribution with the Unadjusted Langevin Algorithm , 2016 .

[23]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[24]  Yee Whye Teh,et al.  Consistency and Fluctuations For Stochastic Gradient Langevin Dynamics , 2014, J. Mach. Learn. Res..

[25]  Yee Whye Teh,et al.  Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics , 2016, J. Mach. Learn. Res..

[26]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[27]  Michael Betancourt,et al.  The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling , 2015, ICML.

[28]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[29]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[30]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[31]  A. Eberle Error bounds for Metropolis–Hastings algorithms applied to perturbations of Gaussian measures in high dimensions , 2012, 1210.1180.

[32]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[33]  M. Ledoux,et al.  Analysis and Geometry of Markov Diffusion Operators , 2013 .

[34]  Ahn Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[35]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[36]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[37]  E. Vanden-Eijnden,et al.  Non-asymptotic mixing of the MALA algorithm , 2010, 1008.3514.

[38]  P. Cattiaux,et al.  A note on Talagrand’s transportation inequality and logarithmic Sobolev inequality , 2008, 0810.5435.

[39]  D. Bakry,et al.  A simple proof of the Poincaré inequality for a large class of probability measures , 2008 .

[40]  C. Villani,et al.  Weighted Csiszár-Kullback-Pinsker inequalities and applications to transportation inequalities , 2005 .

[41]  A. Bovier,et al.  Metastability in Reversible Diffusion Processes I: Sharp Asymptotics for Capacities and Exit Times , 2004 .

[42]  Jonathan C. Mattingly,et al.  Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise , 2002 .

[43]  J. Rosenthal,et al.  Optimal scaling of discrete approximations to Langevin diffusions , 1998 .

[44]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[45]  R. Tweedie,et al.  Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms , 1996 .

[46]  Peter E. Kloeden,et al.  Applications of Stochastic Differential Equations , 1992 .

[47]  C. Hwang,et al.  Diffusion for global optimization in R n , 1987 .

[48]  I. Gyöngy Mimicking the one-dimensional marginal distributions of processes having an ito differential , 1986 .

[49]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .