Sampling for Bayesian Computation with Large Datasets

Multilevel models are extremely useful in handling large hierarchical datasets. However, computation can be a challenge, both in storage and CPU time per iteration of Gibbs sampler or other Markov chain Monte Carlo algorithms. We propose a computational strategy based on sampling the data, computing separate posterior distributions based on each sample, and then combining these to get a consensus posterior inference. With hierarchical data structures, we perform cluster sampling into subsets with the same structures as the original data. This reduces the number of parameters as well as sample size for each separate model fit. We illustrate with examples from climate modeling and newspaper marketing.

[1]  G. Robinson That BLUP is a Good Thing: The Estimation of Random Effects , 1991 .

[2]  D. Lindley,et al.  Bayes Estimates for the Linear Model , 1972 .

[3]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[4]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[5]  T. N. Sriram Asymptotics in Statistics–Some Basic Concepts , 2002 .

[6]  Robert D. Tortora,et al.  Sampling: Design and Analysis , 2000 .

[7]  J. Besag,et al.  Bayesian Computation and Stochastic Systems , 1995 .

[8]  D. Rubin Estimation in Parallel Randomized Experiments , 1981 .

[9]  N. Chopin A sequential particle filter method for static models , 2002 .

[10]  W. Gilks,et al.  Following a moving target—Monte Carlo inference for dynamic Bayesian models , 2001 .

[11]  Alternating Subspace-Spanning Resampling to Accelerate Markov Chain Monte Carlo Simulation , 2003 .

[12]  L. Wasserman,et al.  Rates of convergence of posterior distributions , 2001 .

[13]  Carolyn Pillers Dobler,et al.  Mathematical Statistics , 2002 .

[14]  Bradley P. Carlin,et al.  BAYES AND EMPIRICAL BAYES METHODS FOR DATA ANALYSIS , 1996, Stat. Comput..

[15]  T. Louis,et al.  Bayes and Empirical Bayes Methods for Data Analysis. , 1997 .

[16]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[17]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[18]  David Madigan,et al.  A Sequential Monte Carlo Method for Bayesian Analysis of Massive Datasets , 2003, Data Mining and Knowledge Discovery.

[19]  Risto Lehtonen,et al.  Multilevel Statistical Models , 2005 .

[20]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[21]  Charles J. Geyer,et al.  Practical Markov Chain Monte Carlo , 1992 .