Bayesian Comparison of Latent Variable Models: Conditional Versus Marginal Likelihoods

Typical Bayesian methods for models with latent variables (or random effects) involve directly sampling the latent variables along with the model parameters. In high-level software code for model definitions (using, e.g., BUGS, JAGS, Stan), the likelihood is therefore specified as conditional on the latent variables. This can lead researchers to perform model comparisons via conditional likelihoods, where the latent variables are considered model parameters. In other settings, however, typical model comparisons involve marginal likelihoods where the latent variables are integrated out. This distinction is often overlooked despite the fact that it can have a large impact on the comparisons of interest. In this paper, we clarify and illustrate these issues, focusing on the comparison of conditional and marginal Deviance Information Criteria (DICs) and Watanabe-Akaike Information Criteria (WAICs) in psychometric modeling. The conditional/marginal distinction corresponds to whether the model should be predictive for the clusters that are in the data or for new clusters (where "clusters" typically correspond to higher-level units like people or schools). Correspondingly, we show that marginal WAIC corresponds to leave-one-cluster out cross-validation, whereas conditional WAIC corresponds to leave-one-unit out. These results lead to recommendations on the general application of the criteria to models with latent variables.

[1]  Xin-Yuan Song and Sik-Yum Lee. Basic and advanced Bayesian structural equation modeling , 2015 .

[2]  Anne Corinne Huggins-Manley,et al.  Sensitivity analysis and choosing between alternative polytomous IRT models using Bayesian model comparison criteria , 2019, Commun. Stat. Simul. Comput..

[3]  Robert J. Mislevy,et al.  Bayesian Psychometric Modeling , 2016 .

[4]  Robert J. Mislevy,et al.  Bayes modal estimation in item response models , 1986 .

[5]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[6]  Shi Qiu,et al.  Approximating cross-validatory predictive evaluation in Bayesian latent variable models with integrated IS and WAIC , 2014, Stat. Comput..

[7]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[8]  Sik-Yum Lee,et al.  Basic and Advanced Bayesian Structural Equation Modeling: With Applications in the Medical and Behavioral Sciences , 2012 .

[9]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[10]  Yves Rosseel,et al.  lavaan: An R Package for Structural Equation Modeling , 2012 .

[11]  Richard McElreath,et al.  Statistical Rethinking: A Bayesian Course with Examples in R and Stan , 2015 .

[12]  R. Millar,et al.  Comparison of Hierarchical Bayesian Models for Overdispersed Count Data using DIC and Bayes' Factors , 2009, Biometrics.

[13]  A. O'Hagan,et al.  On posterior joint and marginal modes , 1976 .

[14]  Clement A. Stone,et al.  Bayesian Comparison of Alternative Graded Response Models for Performance Assessment Applications , 2012 .

[15]  A. Gelfand,et al.  Efficient parametrisations for normal linear mixed models , 1995 .

[16]  Adrian Raftery,et al.  The Number of Iterations, Convergence Diagnostics and Generic Metropolis Algorithms , 1995 .

[17]  Paul De Boeck,et al.  Random Item IRT Models , 2008 .

[18]  Allan S. Cohen,et al.  Model Selection Indices for Polytomous Items , 2009 .

[19]  D. Bates,et al.  Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Effects Model , 1995 .

[20]  Matthew J. Denwood,et al.  runjags: An R Package Providing Interface Utilities, Model Templates, Parallel Computing Methods and Additional Distributions for MCMC Models in JAGS , 2016 .

[21]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[22]  J. Naylor,et al.  Applications of a Method for the Efficient Computation of Posterior Distributions , 1982 .

[23]  Adrian E. Raftery,et al.  Bayesian Model Averaging: A Tutorial , 2016 .

[24]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[25]  Aki Vehtari,et al.  Understanding predictive information criteria for Bayesian models , 2013, Statistics and Computing.

[26]  D. J. Spiegelhalter,et al.  Identifying outliers in Bayesian hierarchical models: a simulation-based approach , 2007 .

[27]  D. Kaplan,et al.  Bayesian Statistics for the Social Sciences , 2014 .

[28]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[29]  Daniel Coulter Furr,et al.  Bayesian and frequentist cross-validation methods for explanatory item response models , 2017 .

[30]  Eric-Jan Wagenmakers,et al.  Limitations of Bayesian Leave-One-Out Cross-Validation for Model Selection , 2018, Computational brain & behavior.

[31]  C. Robert,et al.  Deviance information criteria for missing data models , 2006 .

[32]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[33]  J. Neyman,et al.  Consistent Estimates Based on Partially Consistent Observations , 1948 .

[34]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[35]  Ian R. White,et al.  Simsum: Analyses of Simulation Studies Including Monte Carlo Error , 2010 .

[36]  Aki Vehtari,et al.  Using Stacking to Average Bayesian Predictive Distributions (with Discussion) , 2017, Bayesian Analysis.

[37]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[38]  Conor V Dolan,et al.  Stereotype threat and group differences in test performance: a question of measurement invariance. , 2005, Journal of personality and social psychology.

[39]  Yves Rosseel,et al.  blavaan: Bayesian structural equation models via parameter expansion , 2015, 1511.05604.

[40]  Aki Vehtari,et al.  Limitations of “Limitations of Bayesian Leave-one-out Cross-Validation for Model Selection” , 2018, Computational Brain & Behavior.

[41]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[42]  Thomas A. Severini,et al.  Integrated likelihood computation methods , 2017, Comput. Stat..

[43]  David B. Dunson,et al.  Bayesian data analysis, third edition , 2013 .

[44]  Yong Luo,et al.  Performances of LOO and WAIC as IRT Model Selection Methods , 2017 .

[45]  Aki Vehtari,et al.  Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC , 2015, Statistics and Computing.

[46]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[47]  Robert J. Mislevy Bayes modal estimation in item response models , 1986 .

[48]  Aki Vehtari,et al.  Comparison of Bayesian predictive methods for model selection , 2015, Stat. Comput..

[49]  David J. Lunn,et al.  The BUGS Book: A Practical Introduction to Bayesian Analysis , 2013 .

[50]  S. Rabe-Hesketh,et al.  Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects , 2005 .

[51]  Ole Winther,et al.  Bayesian Leave-One-Out Cross-Validation Approximations for Gaussian Latent Variable Models , 2014, J. Mach. Learn. Res..

[52]  J. Fox Bayesian Item Response Modeling: Theory and Applications , 2010 .

[53]  D. Navarro Between the Devil and the Deep Blue Sea: Tensions Between Scientific Judgement and Statistical Model Selection , 2018, Computational Brain & Behavior.

[54]  Håvard Rue,et al.  Contributed discussion of "Using Stacking to Average Bayesian Predictive Distributions" by Yao et. al. , 2018 .

[55]  Sy-Miin Chow,et al.  A Comparison of Bayesian and Frequentist Model Selection Methods for Factor Analysis Models , 2017, Psychological methods.

[56]  Jean-Paul Fox,et al.  Bayesian Item Response Modeling , 2010 .

[57]  M. Plummer Penalized loss functions for Bayesian model comparison. , 2008, Biostatistics.

[58]  Xue Zhang,et al.  Bayesian Model Selection Methods for Multilevel IRT Models: A Comparison of Five DIC‐Based Indices , 2019, Journal of Educational Measurement.

[59]  Sumio Watanabe,et al.  Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..

[60]  Allan S. Cohen,et al.  Model Selection Methods for Mixture Dichotomous IRT Models , 2009 .

[61]  Russell B. Millar,et al.  Conditional vs marginal estimation of the predictive loss of hierarchical models using WAIC and cross-validation , 2018, Stat. Comput..

[62]  A. Gelfand,et al.  Inequalities between expected marginal log‐likelihoods, with implications for likelihood‐based model complexity and comparison measures , 2003 .

[63]  Bengt Muthén,et al.  Bayesian structural equation modeling: a more flexible representation of substantive theory. , 2012, Psychological methods.

[64]  Elizabeth L. Scott,et al.  Consistent Estimates Based on Partially Consistent Observations Author ( s ) : , 2007 .

[65]  T. Lancaster The incidental parameter problem since 1948 , 2000 .