Informed sub-sampling MCMC: approximate Bayesian inference for large datasets

This paper introduces a framework for speeding up Bayesian inference conducted in presence of large datasets. We design a Markov chain whose transition kernel uses an unknown fraction of fixed size of the available data that is randomly refreshed throughout the algorithm. Inspired by the Approximate Bayesian Computation literature, the subsampling process is guided by the fidelity to the observed data, as measured by summary statistics. The resulting algorithm, Informed Sub-Sampling MCMC, is a generic and flexible approach which, contrary to existing scalable methodologies, preserves the simplicity of the Metropolis–Hastings algorithm. Even though exactness is lost, i.e  the chain distribution approximates the posterior, we study and quantify theoretically this bias and show on a diverse set of examples that it yields excellent performances when the computational budget is limited. If available and cheap to compute, we show that setting the summary statistics as the maximum likelihood estimator is supported by theoretical arguments.

[1]  Robert Kohn,et al.  Exact Subsampling MCMC , 2016 .

[2]  Jonathan C. Mattingly,et al.  Error bounds for Approximations of Markov chains used in Bayesian Sampling , 2017, 1711.05382.

[3]  Robert Kohn,et al.  The Block-Poisson Estimator for Optimally Tuned Exact Subsampling MCMC , 2016, J. Comput. Graph. Stat..

[4]  Arnaud Doucet,et al.  Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach , 2014, ICML.

[5]  P. Glynn,et al.  Markov Chains and Stochastic Stability: Heuristics , 2009 .

[6]  R. Kohn,et al.  Speeding Up MCMC by Efficient Data Subsampling , 2014, Journal of the American Statistical Association.

[7]  Budget,et al.  UvA-DARE ( Digital Academic Repository ) Austerity in MCMC Land : Cutting the Metropolis-Hastings , 2013 .

[8]  C. Andrieu,et al.  The pseudo-marginal approach for efficient Monte Carlo computations , 2009, 0903.5480.

[9]  Jean-Michel Marin,et al.  Approximate Bayesian computational methods , 2011, Statistics and Computing.

[10]  M. Feldman,et al.  Population growth of human Y chromosomes: a study of Y chromosome microsatellites. , 1999, Molecular biology and evolution.

[11]  D. Balding,et al.  Statistical Applications in Genetics and Molecular Biology On Optimal Selection of Summary Statistics for Approximate Bayesian Computation , 2011 .

[12]  C. Andrieu,et al.  Convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms , 2012, 1210.1484.

[13]  C. Geyer,et al.  Annealing Markov chain Monte Carlo with applications to ancestral inference , 1995 .

[14]  James Zou,et al.  Quantifying the accuracy of approximate diffusions and Markov chains , 2016, AISTATS.

[15]  Pierre Alquier,et al.  Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels , 2014, Statistics and Computing.

[16]  L. L. Cam,et al.  Asymptotic Methods In Statistical Decision Theory , 1986 .

[17]  D. Rudolf,et al.  Perturbation theory for Markov chains via Wasserstein distance , 2015, Bernoulli.

[18]  C. Robert,et al.  A mixture representation of π with applications in Markov chain Monte Carlo and perfect sampling , 2004, math/0407120.

[19]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[20]  A. Y. Mitrophanov,et al.  Sensitivity and convergence of uniformly ergodic Markov chains , 2005 .

[21]  Paul Fearnhead,et al.  Piecewise Deterministic Markov Processes for Continuous-Time Monte Carlo , 2016, Statistical Science.

[22]  Jonathan C. Mattingly,et al.  Error bounds for Approximations of Markov chains , 2017 .

[23]  M. Benton,et al.  Trends in Ecology & Evolution , 2019 .

[24]  Anthony Lee,et al.  Accelerating Metropolis-Hastings algorithms by Delayed Acceptance , 2015, Foundations of Data Science.

[25]  Arnak S. Dalalyan,et al.  Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent , 2017, COLT.

[26]  Max Welling,et al.  Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget , 2013, ICML 2014.

[27]  R. Douc,et al.  Quantitative bounds on convergence of time-inhomogeneous Markov chains , 2004, math/0503532.

[28]  J. Møller Discussion on the paper by Feranhead and Prangle , 2012 .

[29]  P. Fearnhead,et al.  The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data , 2016, The Annals of Statistics.

[30]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[31]  Anthony Lee,et al.  Stability of noisy Metropolis–Hastings , 2015, Stat. Comput..

[32]  Y. Amit,et al.  Towards a coherent statistical framework for dense deformable template estimation , 2007 .

[33]  M. Gutmann,et al.  Approximate Bayesian Computation , 2019, Annual Review of Statistics and Its Application.

[34]  Le Cam,et al.  On some asymptotic properties of maximum likelihood estimates and related Bayes' estimates , 1953 .

[35]  Paul Fearnhead,et al.  Constructing summary statistics for approximate Bayesian computation: semi‐automatic approximate Bayesian computation , 2012 .

[36]  R. Wilkinson Approximate Bayesian computation (ABC) gives exact results under the assumption of model error , 2008, Statistical applications in genetics and molecular biology.

[37]  H. Haario,et al.  An adaptive Metropolis algorithm , 2001 .

[38]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[39]  Arnaud Doucet,et al.  On Markov chain Monte Carlo methods for tall data , 2015, J. Mach. Learn. Res..

[40]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[41]  O. François,et al.  Approximate Bayesian Computation (ABC) in practice. , 2010, Trends in ecology & evolution.

[42]  J. Rosenthal,et al.  Optimal scaling for various Metropolis-Hastings algorithms , 2001 .

[43]  Jonathan C. Mattingly,et al.  Optimal approximating Markov chains for Bayesian inference , 2015, 1508.03387.

[44]  P. Jacob,et al.  On nonnegative unbiased estimators , 2013, 1309.6473.

[45]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.