论文信息 - Subsampled Stochastic Variance-Reduced Gradient Langevin Dynamics - 字舞流文

Subsampled Stochastic Variance-Reduced Gradient Langevin Dynamics

Stochastic variance-reduced gradient Langevin dynamics (SVRG-LD) was recently proposed to improve the performance of stochastic gradient Langevin dynamics (SGLD) by reducing the variance of the stochastic gradient. In this paper, we propose a variant of SVRG-LD, namely SVRG-LD, which replaces the full gradient in each epoch with a subsampled one. We provide a nonasymptotic analysis of the convergence of SVRG-LD in 2-Wasserstein distance, and show that SVRG-LD enjoys a lower gradient complexity1 than SVRG-LD, when the sample size is large or the target accuracy requirement is moderate. Our analysis directly implies a sharper convergence rate for SVRG-LD, which improves the existing convergence rate by a factor of κn, where κ is the condition number of the log-density function and n is the sample size. Experiments on both synthetic and real-world datasets validate our theoretical results.

Quanquan Gu | Pan Xu | Difan Zou | Quanquan Gu | Difan Zou | Pan Xu

[1] Mark W. Schmidt,et al. StopWasting My Gradients: Practical SVRG , 2015, NIPS.

[2] Peter L. Bartlett,et al. Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[3] R. Tweedie,et al. Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[4] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[5] Zhanxing Zhu,et al. Langevin Dynamics with Continuous Tempering for High-dimensional Non-convex Optimization , 2017, ArXiv.

[6] Nando de Freitas,et al. Adaptive Hamiltonian and Riemann Manifold Monte Carlo , 2013, ICML.

[7] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[8] C. Hwang,et al. Diffusion for global optimization in R n , 1987 .

[9] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[10] Jian Li,et al. Stochastic gradient Hamiltonian Monte Carlo with variance reduction for Bayesian inference , 2018, Machine Learning.

[11] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[12] É. Moulines,et al. Sampling from a strongly log-concave distribution with the Unadjusted Langevin Algorithm , 2016 .

[13] Michael I. Jordan,et al. Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[14] Christopher Nemeth,et al. Control variates for stochastic gradient MCMC , 2017, Statistics and Computing.

[15] Lawrence Carin,et al. On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[16] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[17] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[18] Ahn. Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[19] Alexander J. Smola,et al. Variance Reduction in Stochastic Gradient Langevin Dynamics , 2016, NIPS.

[20] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[21] Michael I. Jordan,et al. Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.

[22] Quanquan Gu,et al. Stochastic Variance-Reduced Hamilton Monte Carlo Methods , 2018, ICML.

[23] Michael I. Jordan,et al. Less than a Single Pass: Stochastically Controlled Stochastic Gradient , 2016, AISTATS.

[24] Jinghui Chen,et al. Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[25] Tianqi Chen,et al. A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[26] Lawrence Carin,et al. A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC , 2018, Science China Information Sciences.

[27] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.

[28] Yuchen Zhang,et al. A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[29] Arnak S. Dalalyan,et al. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent , 2017, COLT.

[30] Michael I. Jordan,et al. On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo , 2018, ICML.

[31] Arnak S. Dalalyan,et al. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[32] P. Kloeden,et al. Higher-order implicit strong numerical schemes for stochastic differential equations , 1992 .

[33] A. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[34] Leonard Hasenclever,et al. The True Cost of Stochastic Gradient Langevin Dynamics , 2017, 1706.02692.

[35] Tianqi Chen,et al. Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.