论文信息 - Stochastic Variance-Reduced Hamilton Monte Carlo Methods - 字舞流文

Stochastic Variance-Reduced Hamilton Monte Carlo Methods

We propose a fast stochastic Hamilton Monte Carlo (HMC) method, for sampling from a smooth and strongly log-concave distribution. At the core of our proposed method is a variance reduction technique inspired by the recent advance in stochastic optimization. We show that, to achieve $\epsilon$ accuracy in 2-Wasserstein distance, our algorithm achieves $\tilde O\big(n+\kappa^{2}d^{1/2}/\epsilon+\kappa^{4/3}d^{1/3}n^{2/3}/\epsilon^{2/3}\big)$ gradient complexity (i.e., number of component gradient evaluations), which outperforms the state-of-the-art HMC and stochastic gradient HMC methods in a wide regime. We also extend our algorithm for sampling from smooth and general log-concave distributions, and prove the corresponding gradient complexity as well. Experiments on both synthetic and real data demonstrate the superior performance of our algorithm.

Quanquan Gu | Pan Xu | Difan Zou | Quanquan Gu | Difan Zou | Pan Xu

[1] Yuchen Zhang,et al. A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[2] Arnak S. Dalalyan,et al. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent , 2017, COLT.

[3] Ahn,et al. Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[4] É. Moulines,et al. Sampling from a strongly log-concave distribution with the Unadjusted Langevin Algorithm , 2016 .

[5] Peter L. Bartlett,et al. Convergence of Langevin MCMC in KL-divergence , 2017, ALT.

[6] A. Eberle,et al. Couplings and quantitative contraction rates for Langevin dynamics , 2017, The Annals of Probability.

[7] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[8] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[9] Arnak S. Dalalyan,et al. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[10] Tianqi Chen,et al. A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[11] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.

[12] P. Kloeden,et al. Higher-order implicit strong numerical schemes for stochastic differential equations , 1992 .

[13] Nando de Freitas,et al. An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[14] G. Roberts,et al. Langevin Diffusions and Metropolis-Hastings Algorithms , 2002 .

[15] Lin Xiao,et al. A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[16] Tianqi Chen,et al. Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[17] É. Moulines,et al. Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm , 2015, 1507.05021.

[18] R. Tweedie,et al. Langevin-Type Models I: Diffusions with Given Stationary Distributions and their Discretizations* , 1999 .

[19] Alain Durmus,et al. High-dimensional Bayesian inference via the unadjusted Langevin algorithm , 2016, Bernoulli.

[20] R. Tweedie,et al. Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[21] S. Duane,et al. Hybrid Monte Carlo , 1987 .

[22] Jinghui Chen,et al. Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization , 2017, NeurIPS.

[23] G. Parisi. Correlation functions and computer simulations (II) , 1981 .

[24] M. Ledoux,et al. Analysis and Geometry of Markov Diffusion Operators , 2013 .

[25] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[26] C. Hwang,et al. Diffusion for global optimization in R n , 1987 .

[27] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.

[28] Christopher Nemeth,et al. Control variates for stochastic gradient MCMC , 2017, Statistics and Computing.

[29] Lawrence Carin,et al. On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[30] S. Shreve,et al. Stochastic differential equations , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.

[31] A. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.

[32] Michael I. Jordan,et al. Less than a Single Pass: Stochastically Controlled Stochastic Gradient , 2016, AISTATS.

[33] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[34] S. F. Jarner,et al. Geometric ergodicity of Metropolis algorithms , 2000 .

[35] R. Tweedie,et al. Langevin-Type Models II: Self-Targeting Candidates for MCMC Algorithms* , 1999 .

[36] Michael I. Jordan,et al. Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.

[37] J. Rosenthal,et al. Optimal scaling of discrete approximations to Langevin diffusions , 1998 .

[38] Michael I. Jordan,et al. Non-convex Finite-Sum Optimization Via SCSG Methods , 2017, NIPS.

[39] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[40] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[41] Alexander J. Smola,et al. Variance Reduction in Stochastic Gradient Langevin Dynamics , 2016, NIPS.

[42] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.