On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo

We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion. We analyze these methods under a uniform set of assumptions on the log-posterior distribution, assuming it to be smooth, strongly convex and Hessian Lipschitz. This is achieved by a new proof technique combining ideas from finite-sum optimization and the analysis of sampling methods. Our sharp theoretical bounds allow us to identify regimes of interest where each method performs better than the others. Our theory is verified with experiments on real-world and synthetic datasets.

[1]  Dean S. Clark,et al.  Short proof of a discrete gronwall inequality , 1987, Discret. Appl. Math..

[2]  Peter L. Bartlett,et al.  Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..

[3]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[4]  H. Robbins A Stochastic Approximation Method , 1951 .

[5]  C. Villani Optimal Transport: Old and New , 2008 .

[6]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[7]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[8]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[9]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[10]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[11]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[12]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[13]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[14]  A. Dalalyan Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[15]  G. Pavliotis Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations , 2014 .

[16]  Aurélien Lucchi,et al.  Variance Reduced Stochastic Gradient Descent with Neighbors , 2015, NIPS.

[17]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[18]  É. Moulines,et al.  Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[19]  Mark W. Schmidt,et al.  Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields , 2015, AISTATS.

[20]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[21]  Eric Moulines,et al.  Stochastic Gradient Richardson-Romberg Markov Chain Monte Carlo , 2016, NIPS.

[22]  Aaron Defazio,et al.  A Simple Practical Accelerated Method for Finite Sums , 2016, NIPS.

[23]  Alexander J. Smola,et al.  Variance Reduction in Stochastic Gradient Langevin Dynamics , 2016, NIPS.

[24]  Leonard Hasenclever,et al.  The True Cost of Stochastic Gradient Langevin Dynamics , 2017, 1706.02692.

[25]  Arnak S. Dalalyan,et al.  Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent , 2017, COLT.

[26]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[27]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[28]  Michael I. Jordan,et al.  Less than a Single Pass: Stochastically Controlled Stochastic Gradient , 2016, AISTATS.

[29]  Michael I. Jordan,et al.  Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[30]  Peter L. Bartlett,et al.  Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[31]  Lawrence Carin,et al.  A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC , 2018, Science China Information Sciences.

[32]  Arnak S. Dalalyan,et al.  User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[33]  P. Fearnhead,et al.  The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data , 2016, The Annals of Statistics.

[34]  Christopher Nemeth,et al.  Control variates for stochastic gradient MCMC , 2017, Statistics and Computing.

[35]  Alain Durmus,et al.  High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.