Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation.

Although missing data methods have advanced in recent years, methodologists have devoted less attention to multilevel data structures where observations at level-1 are nested within higher-order organizational units at level-2 (e.g., individuals within neighborhoods; repeated measures nested within individuals; students nested within classrooms). Joint modeling and chained equations imputation are the principal imputation frameworks for single-level data, and both have multilevel counterparts. These approaches differ algorithmically and in their functionality; both are appropriate for simple random intercept analyses with normally distributed data, but they differ beyond that. The purpose of this paper is to describe multilevel imputation strategies and evaluate their performance in a variety of common analysis models. Using multiple imputation theory and computer simulations, we derive 4 major conclusions: (a) joint modeling and chained equations imputation are appropriate for random intercept analyses; (b) the joint model is superior for analyses that posit different within- and between-cluster associations (e.g., a multilevel regression model that includes a level-1 predictor and its cluster means, a multilevel structural equation model with different path values at level-1 and level-2); (c) chained equations imputation provides a dramatic improvement over joint modeling in random slope analyses; and (d) a latent variable formulation for categorical variables is quite effective. We use a real data analysis to demonstrate multilevel imputation, and we suggest a number of avenues for future research. (PsycINFO Database Record

[1]  Allan Donner,et al.  Imputation Strategies for Missing Continuous Outcomes in Cluster Randomized Trials , 2008, Biometrical journal. Biometrische Zeitschrift.

[2]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[3]  Stef van Buuren,et al.  Multiple imputation of discrete and continuous data by fully conditional specification , 2007 .

[4]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[5]  Rebecca R Andridge,et al.  Quantifying the impact of fixed effects modeling of clusters in multiple imputation for cluster randomized trials , 2011, Biometrical journal. Biometrische Zeitschrift.

[6]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[7]  Recai M. Yucel,et al.  Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[8]  Harvey Goldstein,et al.  Multilevel models with multivariate mixed response types , 2009 .

[9]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[10]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[11]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[12]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[13]  J. Schafer Multiple Imputation in Multivariate Problems When the Imputation and Analysis Models Differ , 2003 .

[14]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[15]  John Aitchison,et al.  Polychotomous quantal response by maximum indicant , 1970 .

[16]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[17]  D Kaplan,et al.  The Impact of Specification Error on the Estimation, Testing, and Improvement of Structural Equation Models. , 1988, Multivariate behavioral research.

[18]  J. Schafer,et al.  Computational Strategies for Multivariate Linear Mixed-Effects Models With Missing Values , 2002 .

[19]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[20]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[21]  Mary Kathryn Cowles,et al.  Accelerating Monte Carlo Markov chain convergence for cumulative-link generalized linear models , 1996, Stat. Comput..

[22]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[23]  Craig K. Enders,et al.  The Effects of Cognitive Strategy Instruction on Math Problem Solving of Middle-School Students of Varying Ability. , 2014 .

[24]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[25]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[26]  Harvey Goldstein,et al.  Multilevel Structural Equation Models for the Analysis of Comparative Data on Educational Performance , 2007 .

[27]  Michael G. Kenward,et al.  Multiple Imputation and its Application , 2013 .

[28]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[29]  Rafa M. Kasim,et al.  Application of Gibbs Sampling to Nested Variance Components Models With Heterogeneous Within-Group Variance , 1998 .

[30]  S. van Buuren,et al.  Multiple Imputation of Multilevel Data , 2006 .

[31]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[32]  Simon Jackman,et al.  Estimation and Inference via Bayesian Simulation: An Introduction to Markov Chain Monte Carlo , 2000 .

[33]  Ulrich Trautwein,et al.  A 2 × 2 taxonomy of multilevel latent contextual models: accuracy-bias trade-offs in full and partial error correction models. , 2011, Psychological methods.

[34]  A. Agresti Foundations of Linear and Generalized Linear Models , 2015 .

[35]  Joseph L Schafer,et al.  Robustness of a multivariate normal approximation for imputation of incomplete binary data , 2007, Statistics in medicine.

[36]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[37]  William J. Browne,et al.  Implementation and performance issues in the Bayesian and likelihood fitting of multilevel models , 2000, Comput. Stat..

[38]  Alan M. Zaslavsky,et al.  Using Calibration to Improve Rounding in Imputation , 2008 .

[39]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[40]  H. Stern,et al.  The use of multiple imputation for the analysis of missing data. , 2001, Psychological methods.

[41]  Recai M Yucel,et al.  Random covariances and mixed-effects models for imputing multivariate multilevel continuous data , 2011, Statistical modelling.

[42]  Nicholas J. Horton,et al.  A Potential for Bias When Rounding in Multiple Imputation , 2003 .

[43]  Jim Albert,et al.  Ordinal Data Modeling , 2000 .

[44]  S. Finney Nonnormal and categorical data in structural equation modeling , 2013 .

[45]  John W. Graham,et al.  Missing Data: Analysis and Design , 2012 .

[46]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[47]  Craig K. Enders,et al.  Applied Missing Data Analysis , 2010 .

[48]  Harvey Goldstein,et al.  REALCOM-IMPUTE Software for Multilevel Multiple Imputation with Mixed Response Types , 2011 .

[49]  Hakan Demirtas,et al.  Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: a simulation assessment , 2008 .

[50]  S. West,et al.  Effects of sample size and nonnormality on the estimation of mediated effects in latent variable models. , 1997 .