Double-Parallel Monte Carlo for Bayesian analysis of big data

This paper proposes a simple, practical, and efficient MCMC algorithm for Bayesian analysis of big data. The proposed algorithm suggests to divide the big dataset into some smaller subsets and provides a simple method to aggregate the subset posteriors to approximate the full data posterior. To further speed up computation, the proposed algorithm employs the population stochastic approximation Monte Carlo algorithm, a parallel MCMC algorithm, to simulate from each subset posterior. Since this algorithm consists of two levels of parallel, data parallel and simulation parallel, it is coined as “Double-Parallel Monte Carlo.” The validity of the proposed algorithm is justified mathematically and numerically.

[1]  Edward I. George,et al.  Bayes and big data: the consensus Monte Carlo algorithm , 2016, Big Data and Information Theory.

[2]  D. Dunson,et al.  Simple, scalable and accurate posterior interval estimation , 2016, 1605.04029.

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  David B. Dunson,et al.  Robust and Scalable Bayes via a Median of Subset Posterior Measures , 2014, J. Mach. Learn. Res..

[5]  F. Liang,et al.  Weak Convergence Rates of Population Versus Single-Chain Stochastic Approximation MCMC Algorithms , 2013, Advances in Applied Probability.

[6]  B. Roe,et al.  Boosted decision trees as an alternative to artificial neural networks for particle identification , 2004, physics/0408124.

[7]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[8]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[9]  R. Carroll,et al.  Stochastic Approximation in Monte Carlo Computation , 2007 .

[10]  David B. Dunson,et al.  Scalable Bayes via Barycenter in Wasserstein Space , 2015, J. Mach. Learn. Res..

[11]  M. Wand Functions for Kernel Smoothing Supporting Wand & Jones (1995) , 2015 .

[12]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Chong Wang,et al.  Asymptotically Exact, Embarrassingly Parallel MCMC , 2013, UAI.

[14]  Xiangyu Wang,et al.  Parallelizing MCMC via Weierstrass Sampler , 2013, 1312.4605.

[15]  Eric Moulines,et al.  Stability of Stochastic Approximation under Verifiable Conditions , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[16]  F. Liang On the use of stochastic approximation Monte Carlo for Monte Carlo integration , 2009 .

[17]  Xiangyu Wang,et al.  Parallelizing MCMC with Random Partition Trees , 2015, NIPS.

[18]  H. Robbins A Stochastic Approximation Method , 1951 .

[19]  L. Tierney,et al.  The validity of posterior expansions based on Laplace''s method , 1990 .

[20]  Volkan Cevher,et al.  WASP: Scalable Bayes via barycenters of subset posteriors , 2015, AISTATS.

[21]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[22]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.