Bayes and big data: the consensus Monte Carlo algorithm

A useful definition of ‘big data’ is data that is too big to process comfortably on a single machine, either because of processor, memory, or disk bottlenecks. Graphics processing units can alleviate the processor bottleneck, but memory or disk bottlenecks can only be eliminated by splitting data across multiple machines. Communication between large numbers of machines is expensive (regardless of the amount of data being communicated), so there is a need for algorithms that perform distributed approximate Bayesian analyses with minimal communication. Consensus Monte Carlo operates by running a separate Monte Carlo algorithm on each machine, and then averaging individual Monte Carlo draws across machines. Depending on the model, the resulting draws can be nearly indistinguishable from the draws that would have been obtained by running a single-machine algorithm for a very long time. Examples of consensus Monte Carlo are shown for simple models where single-machine solutions are available, for large single-layer hierarchical models, and for Bayesian additive regression trees (BART).

[1]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[2]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[3]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[4]  Adrian F. M. Smith,et al.  Automatic Bayesian curve fitting , 1998 .

[5]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[6]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[7]  T. N. Sriram Asymptotics in Statistics–Some Basic Concepts , 2002 .

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Andrew Gelman,et al.  Sampling for Bayesian Computation with Large Datasets , 2005 .

[10]  Ryan Hafen,et al.  Visualization Databases for the Analysis of Large Complex Datasets , 2009, AISTATS.

[11]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[12]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[13]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[14]  Jarrod D. Hadfield,et al.  MCMC methods for multi-response generalized linear mixed models , 2010 .

[15]  Cliburn Chan,et al.  Understanding GPU Programming for Statistical Computation: Studies in Massively Parallel Massive Mixtures , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[16]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[17]  Arnaud Doucet,et al.  On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[18]  Purnamrita Sarkar,et al.  A scalable bootstrap for massive data , 2011, 1112.5016.

[19]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[20]  Alexander W. B Locker The potential and perils of preprocessing: Building new foundations , 2013 .

[21]  Xiangyu Wang,et al.  Parallel MCMC via Weierstrass Sampler , 2013, ArXiv.

[22]  Chong Wang,et al.  Asymptotically Exact, Embarrassingly Parallel MCMC , 2013, UAI.

[23]  Xiangyu Wang,et al.  Parallelizing MCMC with Random Partition Trees , 2015, NIPS.

[24]  David B. Dunson,et al.  Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..