Multiple Imputation of Missing Data in Multilevel Designs: A Comparison of Different Strategies

Multiple imputation is a widely recommended means of addressing the problem of missing data in psychological research. An often-neglected requirement of this approach is that the imputation model used to generate the imputed values must be at least as general as the analysis model. For multilevel designs in which lower level units (e.g., students) are nested within higher level units (e.g., classrooms), this means that the multilevel structure must be taken into account in the imputation model. In the present article, we compare different strategies for multiply imputing incomplete multilevel data using mathematical derivations and computer simulations. We show that ignoring the multilevel structure in the imputation may lead to substantial negative bias in estimates of intraclass correlations as well as biased estimates of regression coefficients in multilevel models. We also demonstrate that an ad hoc strategy that includes dummy indicators in the imputation model to represent the multilevel structure may be problematic under certain conditions (e.g., small groups, low intraclass correlations). Imputation based on a multivariate linear mixed effects model was the only strategy to produce valid inferences under most of the conditions investigated in the simulation study. Data from an educational psychology research project are also used to illustrate the impact of the various multiple imputation strategies.

[1]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[2]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[3]  Leistungsbezogene Informationen,et al.  Erste Ergebnisse aus IGLU , 2008 .

[4]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[5]  D. A. Kenny,et al.  The statistical analysis of data from small groups. , 2002, Journal of personality and social psychology.

[6]  Stephen A. Mistler Multilevel multiple imputation: An examination of competing methods , 2015 .

[7]  Yongyun Shin Efficient Handling of Predictors and Outcomes Having Missing Values , 2013 .

[8]  Allan Donner,et al.  Imputation Strategies for Missing Continuous Outcomes in Cluster Randomized Trials , 2008, Biometrical journal. Biometrische Zeitschrift.

[9]  Craig K. Enders,et al.  Missing Data in Educational Research: A Review of Reporting Practices and Suggestions for Improvement , 2004 .

[10]  M. Croon,et al.  Predicting group-level outcome variables from variables measured at the individual level: a latent variable multilevel model. , 2007, Psychological methods.

[11]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[12]  Harvey Goldstein,et al.  REALCOM-IMPUTE Software for Multilevel Multiple Imputation with Mixed Response Types , 2011 .

[13]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[14]  Joseph L. Schafer,et al.  Multiple imputation with PAN. , 2001 .

[15]  Michael C Neale,et al.  People are variables too: multilevel structural equations modeling. , 2005, Psychological methods.

[16]  Kimberly L. Henry,et al.  Individual and Contextual Effects of School Adjustment on Adolescent Alcohol Use , 2009, Prevention Science.

[17]  H. Johnson,et al.  A comparison of 'traditional' and multimedia information systems development practices , 2003, Inf. Softw. Technol..

[18]  Hakan Demirtas,et al.  Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: a simulation assessment , 2008 .

[19]  Stephen Olejnik,et al.  Treatment Of Missing Data At The Second Level Of Hierarchical Linear Models , 2003 .

[20]  Ofer Harel,et al.  Missing data techniques for multilevel data: implications of model misspecification , 2011 .

[21]  J. Mathieu,et al.  Understanding and estimating the power to detect cross-level interaction effects in multilevel modeling. , 2012, The Journal of applied psychology.

[22]  G. A. Marcoulides Multilevel Analysis Techniques and Applications , 2002 .

[23]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[24]  Ulrich Trautwein,et al.  Homework works if homework quality is high: Using multilevel modeling to predict the development of achievement in mathematics. , 2010 .

[25]  Ke-Hai Yuan,et al.  Bias and Efficiency for SEM With Missing Data and Auxiliary Variables: Two-Stage Robust Method Versus Two-Stage ML , 2015 .

[26]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[27]  Ke-Hai Yuan,et al.  3. Multilevel Covariance Structure Analysis by Fitting Multiple Single-Level Models , 2007 .

[28]  Richard M Lerner,et al.  Use of missing data methods in longitudinal studies: the persistence of bad practices in developmental psychology. , 2009, Developmental psychology.

[29]  Michael G Kenward,et al.  Are missing data adequately handled in cluster randomised trials? A systematic review and guidelines , 2014, Clinical trials.

[30]  L. Hedges,et al.  Intraclass Correlation Values for Planning Group-Randomized Trials in Education , 2007 .

[31]  Yongyun Shin,et al.  A Latent Cluster-Mean Approach to the Contextual Effects Model With Missing Data , 2010 .

[32]  Todd E. Bodner,et al.  What Improves with Increased Missing Data Imputations? , 2008 .

[33]  Alexander Robitzsch,et al.  Multiple imputation of missing covariate values in multilevel models with random slopes: a cautionary note , 2015, Behavior Research Methods.

[34]  Bengt Muthén,et al.  Beyond multilevel regression modeling: Multilevel analysis in a general latent variable framework. , 2011 .

[35]  Rebecca R Andridge,et al.  Quantifying the impact of fixed effects modeling of clusters in multiple imputation for cluster randomized trials , 2011, Biometrical journal. Biometrische Zeitschrift.

[36]  Peter D. Hoff,et al.  A First Course in Bayesian Statistical Methods , 2009 .

[37]  Ke-Hai Yuan,et al.  ML Versus MI for Missing Data With Violation of Distribution Conditions , 2012, Sociological methods & research.

[38]  S. West New approaches to missing data in psychological research: introduction to the special section. , 2001, Psychological methods.

[39]  Craig K Enders,et al.  Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. , 2016, Psychological methods.

[40]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[41]  P. Bliese Within-group agreement, non-independence, and reliability: Implications for data aggregation and analysis. , 2000 .

[42]  S. van Buuren,et al.  Multiple Imputation of Multilevel Data , 2006 .

[43]  John W Graham,et al.  Planned missing data designs in psychological research. , 2006, Psychological methods.

[44]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[45]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[46]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[47]  Zhen Zhang,et al.  Multilevel structural equation models for assessing moderation within and across levels of analysis. , 2016, Psychological methods.

[48]  Shahab Jolani,et al.  Incomplete Multilevel Data: Problems and solutions , 2015 .

[49]  Benjamin M. Walsh,et al.  A multilevel model of the effects of equal opportunity climate on job satisfaction in the military. , 2010, Journal of occupational health psychology.

[50]  J. Graham,et al.  Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[51]  P. Allison Fixed Effects Regression Models , 2009 .

[52]  Paul T. von Hippel,et al.  HOW TO IMPUTE INTERACTIONS, SQUARES, AND OTHER TRANSFORMED VARIABLES , 2009 .

[53]  H. Goldstein Multilevel Statistical Models , 2006 .

[54]  J. Kounin Discipline and group management in classrooms , 1970 .

[55]  B. Muthén,et al.  The multilevel latent covariate model: a new, more reliable approach to group-level effects in contextual studies. , 2008, Psychological methods.

[56]  Cheri Ostroff,et al.  Comparing Correlations Based on Individual-Level and Aggregated Data , 1993 .

[57]  Recai M Yucel,et al.  Random covariances and mixed-effects models for imputing multivariate multilevel continuous data , 2011, Statistical modelling.

[58]  Victoria Savalei,et al.  Robust Two-Stage Approach Outperforms Robust Full Information Maximum Likelihood With Incomplete Nonnormal Data , 2014 .

[59]  Roel Bosker,et al.  Multilevel analysis : an introduction to basic and advanced multilevel modeling , 1999 .

[60]  Tihomir Asparouhov,et al.  Multiple Imputation with Mplus , 2010 .

[61]  Hakan Demirtas,et al.  Impact of non-normal random effects on inference by multiple imputation: A simulation assessment , 2010, Comput. Stat. Data Anal..

[62]  John W. Graham,et al.  Missing Data: Analysis and Design , 2012 .

[63]  Joerg Drechsler Multiple Imputation of Multilevel Missing Data—Rigor Versus Simplicity , 2015 .

[64]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[65]  Craig K. Enders,et al.  Applied Missing Data Analysis , 2010 .

[66]  Lee J. Cronbach,et al.  Research on Classrooms and Schools: Formulation of Questions, Design and Analysis. , 1976 .

[67]  Craig K Enders,et al.  Estimating interaction effects with incomplete predictor variables. , 2014, Psychological methods.

[68]  Matteo Quartagno,et al.  jomo: Multilevel Joint Modelling Multiple Imputation , 2016 .

[69]  Harvey Goldstein,et al.  Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non‐linear terms , 2014 .

[70]  J. Schafer,et al.  Computational Strategies for Multivariate Linear Mixed-Effects Models With Missing Values , 2002 .

[71]  Michael G. Kenward,et al.  Multiple Imputation and its Application , 2013 .

[72]  Stephen A. Mistler A SAS ® Macro for Applying Multiple Imputation to Multilevel Data , 2013 .

[73]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[74]  D. A. Kenny,et al.  Dyadic Data Analysis , 2006 .

[75]  J. Hox,et al.  Sufficient Sample Sizes for Multilevel Modeling , 2005 .

[76]  Kristopher J Preacher,et al.  A general multilevel SEM framework for assessing multilevel mediation. , 2010, Psychological methods.

[77]  Recai M. Yucel,et al.  Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[78]  D. Rubin INFERENCE AND MISSING DATA , 1975 .