Scalable inference for crossed random effects models

We analyze the complexity of Gibbs samplers for inference in crossed random effect models used in modern analysis of variance. We demonstrate that for certain designs the plain vanilla Gibbs sampler is not scalable, in the sense that its complexity is worse than proportional to the number of parameters and data. We thus propose a simple modification leading to a collapsed Gibbs sampler that is provably scalable. Although our theory requires some balancedness assumptions on the data designs, we demonstrate in simulated and real datasets that the rates it predicts match remarkably the correct rates in cases where the assumptions are violated. We also show that the collapsed Gibbs sampler, extended to sample further unknown hyperparameters, outperforms significantly alternative state of the art algorithms.

[1]  Andrew Gelman,et al.  The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo , 2011, J. Mach. Learn. Res..

[2]  Alexander Volfovsky,et al.  HIERARCHICAL ARRAY PRIORS FOR ANOVA DECOMPOSITIONS OF CROSS-CLASSIFIED DATA. , 2012, The annals of applied statistics.

[3]  G. Roberts,et al.  Analysis of the Gibbs Sampler for Gaussian hierarchical models via multigrid decomposition , 2017, 1703.06098.

[4]  Art B. Owen,et al.  ESTIMATION AND INFERENCE FOR VERY LARGE LINEAR MIXED EFFECTS MODELS , 2016, Statistica Sinica.

[5]  James G. Scott,et al.  On the half-cauchy prior for a global scale parameter , 2011, 1104.4937.

[6]  Darren J. Wilkinson,et al.  A sparse matrix approach to Bayesian computation in large linear models , 2004, Comput. Stat. Data Anal..

[7]  A. Owen,et al.  Efficient moment calculations for variance components in large unbalanced crossed random effects models , 2016, 1602.00346.

[8]  Gareth O. Roberts,et al.  A General Framework for the Parametrization of Hierarchical Models , 2007, 0708.3797.

[9]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[10]  Arlen Brown,et al.  Spectra of tensor products of operators , 1966 .

[11]  C. Geyer,et al.  Correction: Variable transformation to obtain geometric ergodicity in the random-walk Metropolis algorithm , 2012, 1302.6741.

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  Omiros Papaspiliopoulos,et al.  A note on MCMC for nested multilevel regression models via belief propagation , 2017 .

[14]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[15]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[16]  A. Gelman,et al.  Using Redundant Parameterizations to Fit Hierarchical Models , 2008 .

[17]  Gareth O. Roberts,et al.  Markov Chains and De‐initializing Processes , 2001 .

[18]  Gareth O. Roberts,et al.  Non-centred parameterisations for hierarchical models and data augmentation. , 2003 .

[19]  G. Roberts,et al.  Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler , 1997 .

[20]  Jun S. Liu,et al.  Parameter Expansion for Data Augmentation , 1999 .

[21]  Xiao-Li Meng,et al.  Seeking efficient data augmentation schemes via conditional and marginal augmentation , 1999 .

[22]  A. Gelman Analysis of variance: Why it is more important than ever? , 2005, math/0504499.