Subsampled Stochastic Variance-Reduced Gradient Langevin Dynamics

Stochastic variance-reduced gradient Langevin dynamics (SVRG-LD) was recently proposed to improve the performance of stochastic gradient Langevin dynamics (SGLD) by reducing the variance of the stochastic gradient. In this paper, we propose a variant of SVRG-LD, namely SVRG-LD, which replaces the full gradient in each epoch with a subsampled one. We provide a nonasymptotic analysis of the convergence of SVRG-LD in 2-Wasserstein distance, and show that SVRG-LD enjoys a lower gradient complexity1 than SVRG-LD, when the sample size is large or the target accuracy requirement is moderate. Our analysis directly implies a sharper convergence rate for SVRG-LD, which improves the existing convergence rate by a factor of κn, where κ is the condition number of the log-density function and n is the sample size. Experiments on both synthetic and real-world datasets validate our theoretical results.

[1]  Mark W. Schmidt,et al.  StopWasting My Gradients: Practical SVRG , 2015, NIPS.

[2]  Peter L. Bartlett,et al.  Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[3]  R. Tweedie,et al.  Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[4]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[5]  Zhanxing Zhu,et al.  Langevin Dynamics with Continuous Tempering for High-dimensional Non-convex Optimization , 2017, ArXiv.

[6]  Nando de Freitas,et al.  Adaptive Hamiltonian and Riemann Manifold Monte Carlo , 2013, ICML.

[7]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[8]  C. Hwang,et al.  Diffusion for global optimization in R n , 1987 .

[9]  Zeyuan Allen Zhu,et al.  Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[10]  Jian Li,et al.  Stochastic gradient Hamiltonian Monte Carlo with variance reduction for Bayesian inference , 2018, Machine Learning.

[11]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[12]  É. Moulines,et al.  Sampling from a strongly log-concave distribution with the Unadjusted Langevin Algorithm , 2016 .

[13]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[14]  Christopher Nemeth,et al.  Control variates for stochastic gradient MCMC , 2017, Statistics and Computing.

[15]  Lawrence Carin,et al.  On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[16]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[17]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[18]  Ahn Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[19]  Alexander J. Smola,et al.  Variance Reduction in Stochastic Gradient Langevin Dynamics , 2016, NIPS.

[20]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[21]  Michael I. Jordan,et al.  Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.

[22]  Quanquan Gu,et al.  Stochastic Variance-Reduced Hamilton Monte Carlo Methods , 2018, ICML.

[23]  Michael I. Jordan,et al.  Less than a Single Pass: Stochastically Controlled Stochastic Gradient , 2016, AISTATS.

[24]  Jinghui Chen,et al.  Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[25]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[26]  Lawrence Carin,et al.  A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC , 2018, Science China Information Sciences.

[27]  Stefano Soatto,et al.  Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.

[28]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[29]  Arnak S. Dalalyan,et al.  Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent , 2017, COLT.

[30]  Michael I. Jordan,et al.  On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo , 2018, ICML.

[31]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[32]  P. Kloeden,et al.  Higher-order implicit strong numerical schemes for stochastic differential equations , 1992 .

[33]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[34]  Leonard Hasenclever,et al.  The True Cost of Stochastic Gradient Langevin Dynamics , 2017, 1706.02692.

[35]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.