An Efficient Minibatch Acceptance Test for Metropolis-Hastings

We present a novel Metropolis-Hastings method for large datasets that uses small expected-size minibatches of data. Previous work on reducing the cost of Metropolis-Hastings tests yield variable data consumed per sample, with only constant factor reductions versus using the full dataset for each sample. Here we present a method that can be tuned to provide arbitrarily small batch sizes, by adjusting either proposal step size or temperature. Our test uses the noise-tolerant Barker acceptance test with a novel additive correction variable. The resulting test has similar cost to a normal SGD update. Our experiments demonstrate several order-of-magnitude speedups over previous work.

[1]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[2]  Friedrich Götze,et al.  An Edgeworth expansion for symmetric statistics , 1997 .

[3]  Journal of Chemical Physics , 1932, Nature.

[4]  Arnaud Doucet,et al.  Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[5]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[6]  Tianqi Chen,et al.  Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.

[7]  Ahn,et al.  Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[8]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[9]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[10]  Michael Betancourt,et al.  A Conceptual Introduction to Hamiltonian Monte Carlo , 2017, 1701.02434.

[11]  Zoubin Ghahramani,et al.  Scalable Discrete Sampling as a Multi-Armed Bandit Problem , 2015, ICML.

[12]  Christophe Dupuy,et al.  Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling , 2016, J. Mach. Learn. Res..

[13]  G. Uhlenbeck,et al.  On the Theory of the Brownian Motion II , 1945 .

[14]  Ahn Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .

[15]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[16]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[17]  A. Shapiro Monte Carlo Sampling Methods , 2003 .

[18]  S. Canu,et al.  Training Invariant Support Vector Machines using Selective Sampling , 2005 .

[19]  Babak Shahbaba,et al.  Distributed Stochastic Gradient MCMC , 2014, ICML.

[20]  Serguei Novak,et al.  On self-normalized sums and student's statistic , 2005 .

[21]  Richard E. Turner,et al.  Magnetic Hamiltonian Monte Carlo , 2016, ICML.

[22]  Eddie Kohler,et al.  Accelerating MCMC via Parallel Predictive Prefetching , 2014, UAI.

[23]  A. Barker Monte Carlo calculations of the radial distribution functions for a proton-electron plasma , 1965 .

[24]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[25]  Yann LeCun,et al.  The Loss Surface of Multilayer Networks , 2014, ArXiv.

[26]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[27]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[28]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[29]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[30]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..