论文信息 - AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC - 字舞流文

AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC

Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is an efficient method for sampling from continuous distributions. It is a faster alternative to HMC: instead of using the whole dataset at each iteration, SGHMC uses only a subsample. This improves performance, but introduces bias that can cause SGHMC to converge to the wrong distribution. One can prevent this using a step size that decays to zero, but such a step size schedule can drastically slow down convergence. To address this tension, we propose a novel second-order SG-MCMC algorithm---AMAGOLD---that infrequently uses Metropolis-Hastings (M-H) corrections to remove bias. The infrequency of corrections amortizes their cost. We prove AMAGOLD converges to the target distribution with a fixed, rather than a diminishing, step size, and that its convergence rate is at most a constant factor slower than a full-batch baseline. We empirically demonstrate AMAGOLD's effectiveness on synthetic distributions, Bayesian logistic regression, and Bayesian neural networks.

Christopher De Sa | Ruqi Zhang | A. Feder Cooper | Ruqi Zhang | A. Feder Cooper

[1] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2] S. Duane,et al. Hybrid Monte Carlo , 1987 .

[3] A. Horowitz. A generalized guided Monte Carlo algorithm , 1991 .

[4] Michael I. Miller,et al. REPRESENTATIONS OF KNOWLEDGE IN COMPLEX SYSTEMS , 1994 .

[5] R. Tweedie,et al. Exponential convergence of Langevin distributions and their discrete approximations , 1996 .

[6] J. Rosenthal,et al. Optimal scaling of discrete approximations to Langevin diffusions , 1998 .

[7] S. Aida. Uniform Positivity Improving Property, Sobolev Inequalities, and Spectral Gaps , 1998 .

[8] Kluwer Academic Publishers. Methodology and computing in applied probability , 1999 .

[9] R. Tweedie,et al. Langevin-Type Models I: Diffusions with Given Stationary Distributions and their Discretizations* , 1999 .

[10] G. Roberts,et al. Langevin Diffusions and Metropolis-Hastings Algorithms , 2002 .

[11] Michael Chertkov,et al. Irreversible Monte Carlo Algorithms for Efficient Sampling , 2008, ArXiv.

[12] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[13] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[14] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[15] D. Rudolf,et al. Explicit error bounds for Markov chain Monte Carlo , 2011, 1108.3201.

[16] Ahn,et al. Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[17] V. Climenhaga. Markov chains and mixing times , 2013 .

[18] K. Hukushima,et al. An irreversible Markov-chain Monte Carlo method with skew detailed balance conditions , 2013 .

[19] Max Welling,et al. Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[20] Ryan Babbush,et al. Bayesian Sampling Using Stochastic Gradient Thermostats , 2014, NIPS.

[21] Arnaud Doucet,et al. Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[22] A. Stuart,et al. Spectral gaps for a Metropolis–Hastings algorithm in infinite dimensions , 2011, 1112.1392.

[23] Ryan P. Adams,et al. Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[24] Tianqi Chen,et al. Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[25] Tianqi Chen,et al. A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[26] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.

[27] Lawrence Carin,et al. On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators , 2015, NIPS.

[28] Lawrence Carin,et al. Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks , 2015, AAAI.

[29] Zhe Gan,et al. Learning Weight Uncertainty with Stochastic Gradient MCMC for Shape Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Kai Fan,et al. High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models , 2015, AAAI.

[31] E. Fox,et al. A Unifying Framework for Devising Efficient and Irreversible MCMC Samplers , 2016 .

[32] Yee Whye Teh,et al. Exploration of the (Non-)Asymptotic Bias and Variance of Stochastic Gradient Langevin Dynamics , 2016, J. Mach. Learn. Res..

[33] John Canny,et al. An Efficient Minibatch Acceptance Test for Metropolis-Hastings , 2016, UAI.

[34] Zhe Gan,et al. Stochastic Gradient Monomial Gamma Sampler , 2017, ICML.

[35] Charles Blundell,et al. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[36] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[37] Zhe Gan,et al. Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling , 2016, ACL.

[38] Christopher De Sa,et al. Minibatch Gibbs Sampling on Large Graphical Models , 2018, ICML.

[39] Mingyuan Zhou,et al. Semi-Implicit Variational Inference , 2018, ICML.

[40] Mert Gürbüzbalaban,et al. Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Stochastic Optimization: Non-Asymptotic Performance Bounds and Momentum-Based Acceleration , 2018, Oper. Res..

[41] Martin J. Wainwright,et al. Log-concave sampling: Metropolis-Hastings algorithms are fast! , 2018, COLT.

[42] Arnak S. Dalalyan,et al. User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient , 2017, Stochastic Processes and their Applications.

[43] Christopher De Sa,et al. Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees , 2019, NeurIPS.

[44] Lei Wu,et al. Irreversible samplers from jump and continuous Markov processes , 2016, Stat. Comput..

[45] Andrew Gordon Wilson,et al. Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning , 2019, ICLR.