Bayesian model selection for multilevel models using integrated likelihoods

Multilevel linear models allow flexible statistical modelling of complex data with different levels of stratification. Identifying the most appropriate model from the large set of possible candidates is a challenging problem. In the Bayesian setting, the standard approach is a comparison of models using the model evidence or the Bayes factor. Explicit expressions for these quantities are available for the simplest linear models with unrealistic priors, but in most cases, direct computation is impossible. In practice, Markov Chain Monte Carlo approaches are widely used, such as sequential Monte Carlo, but it is not always clear how well such techniques perform. We present a method for estimation of the log model evidence, by an intermediate marginalisation over non-variance parameters. This reduces the dimensionality of any Monte Carlo sampling algorithm, which in turn yields more consistent estimates. The aim of this paper is to show how this framework fits together and works in practice, particularly on data with hierarchical structure. We illustrate this method on simulated multilevel data and on a popular dataset containing levels of radon in homes in the US state of Minnesota.

[1]  S. Eglen,et al.  Sepsis-3 criteria in AmsterdamUMCdb: open-source code implementation , 2022, GigaByte.

[2]  Benjamin Letham,et al.  Forecasting at Scale , 2018, PeerJ Prepr..

[3]  John Salvatier,et al.  Probabilistic programming in Python using PyMC3 , 2016, PeerJ Comput. Sci..

[4]  J. Hilbe Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[5]  Tony O’Hagan Bayes factors , 2006 .

[6]  Andrew Gelman,et al.  Multilevel (Hierarchical) Modeling: What It Can and Cannot Do , 2006, Technometrics.

[7]  Eric R. Ziegel,et al.  Multilevel Modelling of Health Statistics , 2002, Technometrics.

[8]  S. Chib,et al.  Marginal Likelihood From the Metropolis–Hastings Output , 2001 .

[9]  Scott L. Zeger,et al.  Marginalized Multilevel Models and Likelihood Inference , 2000 .

[10]  A. O'Hagan,et al.  Kendall's Advanced Theory of Statistics, Vol. 2b: Bayesian Inference. , 1996 .

[11]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[12]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[13]  B. Carlin,et al.  Bayesian Model Choice Via Markov Chain Monte Carlo Methods , 1995 .

[14]  Daniel Gianola,et al.  Marginal likelihood and Bayesian approaches to the analysis of heterogeneous residual variances in mixed linear Gaussian models , 1992 .

[15]  H. Goldstein,et al.  Multilevel Models in Educational and Social Research. , 1988 .

[16]  G. Casella An Introduction to Empirical Bayes Data Analysis , 1985 .

[17]  G. Box Science and Statistics , 1976 .

[18]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[19]  Arnaud Doucet,et al.  An overview of sequential Monte Carlo methods for parameter estimation in general state-space models , 2009 .

[20]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[21]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[22]  T. Kloek,et al.  Bayesian estimates of equation system parameters, An application of integration by Monte Carlo , 1976 .

[23]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[24]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[25]  L. M. M.-T. Theory of Probability , 1929, Nature.

[26]  HighWire Press Philosophical transactions of the Royal Society of London. Series A, Containing papers of a mathematical or physical character , 1896 .