A Fully Conditional Specification Approach to Multilevel Imputation of Categorical and Continuous Variables

Abstract Specialized imputation routines for multilevel data are widely available in software packages, but these methods are generally not equipped to handle a wide range of complexities that are typical of behavioral science data. In particular, existing imputation schemes differ in their ability to handle random slopes, categorical variables, differential relations at Level-1 and Level-2, and incomplete Level-2 variables. Given the limitations of existing imputation tools, the purpose of this manuscript is to describe a flexible imputation approach that can accommodate a diverse set of 2-level analysis problems that includes any of the aforementioned features. The procedure employs a fully conditional specification (also known as chained equations) approach with a latent variable formulation for handling incomplete categorical variables. Computer simulations suggest that the proposed procedure works quite well, with trivial biases in most cases. We provide a software program that implements the imputation strategy, and we use an artificial data set to illustrate its use.

[1]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[2]  S. van Buuren,et al.  Multiple Imputation of Multilevel Data , 2006 .

[3]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[4]  John M. Abowd,et al.  Multiple Imputation , 2009, Encyclopedia of Database Systems.

[5]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[6]  Robert S. Stawski,et al.  Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling (2nd Edition) , 2013 .

[7]  Joseph L. Schafer,et al.  Multiple imputation with PAN. , 2001 .

[8]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[9]  Bengt Muthén,et al.  Multilevel Factor Analysis of Class and Student Achievement Components , 1991 .

[10]  Craig K. Enders,et al.  A Comparison of Joint Model and Fully Conditional Specification Imputation for Multilevel Missing Data , 2017 .

[11]  Jan de Leeuw,et al.  Introducing Multilevel Modeling , 1998 .

[12]  Sarah Depaoli,et al.  The Impact of Inaccurate “Informative” Priors for Growth Parameters in Bayesian Growth Mixture Modeling , 2014 .

[13]  Ke-Hai Yuan,et al.  ML Versus MI for Missing Data With Violation of Distribution Conditions , 2012, Sociological methods & research.

[14]  Harvey Goldstein,et al.  REALCOM-IMPUTE Software for Multilevel Multiple Imputation with Mixed Response Types , 2011 .

[15]  L. Hedges,et al.  Intraclass Correlation Values for Planning Group-Randomized Trials in Education , 2007 .

[16]  Keith H. Nuechterlein,et al.  An Illustration of Multilevel Factor Analysis , 2005, Journal of personality assessment.

[17]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[18]  Scott M. Lynch,et al.  Introduction to Applied Bayesian Statistics and Estimation for Social Scientists , 2007 .

[19]  Michael D. Toland,et al.  A Multilevel Factor Analysis of Students’ Evaluations of Teaching , 2005 .

[20]  Philippa Clarke,et al.  Addressing Data Sparseness in Contextual Population Research , 2007 .

[21]  P. Tymms,et al.  The Effects of Student Composition on School Outcomes , 2004 .

[22]  Craig K Enders,et al.  Estimating interaction effects with incomplete predictor variables. , 2014, Psychological methods.

[23]  Laura M. Stapleton,et al.  The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration , 2014, Educational Psychology Review.

[24]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[25]  Craig K. Enders,et al.  Applied Missing Data Analysis , 2010 .

[26]  Alan M. Zaslavsky,et al.  Using Calibration to Improve Rounding in Imputation , 2008 .

[27]  Harvey Goldstein,et al.  Multilevel Structural Equation Models for the Analysis of Comparative Data on Educational Performance , 2007 .

[28]  Daniel Stegmueller,et al.  How Many Countries for Multilevel Modeling? A Comparison of Frequentist and Bayesian Approaches , 2013 .

[29]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[30]  Katherine E. Masyn,et al.  Measuring Psychosocial Environments Using Individual Responses: an Application of Multilevel Factor Analysis to Examining Students in Schools , 2015, Prevention Science.

[31]  Tihomir Asparouhov,et al.  Multiple Imputation with Mplus , 2010 .

[32]  C. Robert Simulation of truncated normal variables , 2009, 0907.4010.

[33]  S. West,et al.  Effects of sample size and nonnormality on the estimation of mediated effects in latent variable models. , 1997 .

[34]  H. Marsh,et al.  Teacher frame of reference and the big-fish-little-pond effect , 2005 .

[35]  S. Finney Nonnormal and categorical data in structural equation modeling , 2013 .

[36]  Hakan Demirtas,et al.  Impact of non-normal random effects on inference by multiple imputation: A simulation assessment , 2010, Comput. Stat. Data Anal..

[37]  Jiajuan Liang,et al.  An EM algorithm for fitting two-level structural equation models , 2004 .

[38]  John W. Graham,et al.  Missing Data: Analysis and Design , 2012 .

[39]  William J. Browne,et al.  Implementation and performance issues in the Bayesian and likelihood fitting of multilevel models , 2000, Comput. Stat..

[40]  Francis L. Huang,et al.  Using Multilevel Factor Analysis With Clustered Data , 2016 .

[41]  John L.P. Thompson,et al.  Missing data , 2004, Amyotrophic lateral sclerosis and other motor neuron disorders : official publication of the World Federation of Neurology, Research Group on Motor Neuron Diseases.

[42]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[43]  Geert Verbeke,et al.  MEANINGFUL STATISTICAL MODEL FORMULATIONS FOR REPEATED MEASURES , 2004 .

[44]  Roel Bosker,et al.  Multilevel analysis : an introduction to basic and advanced multilevel modeling , 1999 .

[45]  Joseph L Schafer,et al.  Robustness of a multivariate normal approximation for imputation of incomplete binary data , 2007, Statistics in medicine.

[46]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[47]  David M. Murray,et al.  Methods To Reduce The Impact Of Intraclass Correlation In Group-Randomized Trials , 2003, Evaluation review.

[48]  Nicholas J. Horton,et al.  A Potential for Bias When Rounding in Multiple Imputation , 2003 .

[49]  Jim Albert,et al.  Ordinal Data Modeling , 2000 .

[50]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[51]  B. Muthén,et al.  The multilevel latent covariate model: a new, more reliable approach to group-level effects in contextual studies. , 2008, Psychological methods.

[52]  Fan Jia,et al.  A Comparison of Imputation Strategies for Ordinal Missing Data on Likert Scale Variables , 2015, Multivariate behavioral research.

[53]  Patrick E. Shrout,et al.  Multilevel model notation—establishing the commonalities , 2013 .

[54]  Rafa M. Kasim,et al.  Application of Gibbs Sampling to Nested Variance Components Models With Heterogeneous Within-Group Variance , 1998 .

[55]  D. A. Kenny,et al.  Separating individual and group effects , 1985 .

[56]  H. Stern,et al.  The use of multiple imputation for the analysis of missing data. , 2001, Psychological methods.

[57]  Recai M Yucel,et al.  Random covariances and mixed-effects models for imputing multivariate multilevel continuous data , 2011, Statistical modelling.

[58]  Ulrich Trautwein,et al.  A 2 × 2 taxonomy of multilevel latent contextual models: accuracy-bias trade-offs in full and partial error correction models. , 2011, Psychological methods.

[59]  Paul T. von Hippel,et al.  HOW TO IMPUTE INTERACTIONS, SQUARES, AND OTHER TRANSFORMED VARIABLES , 2009 .

[60]  Sally Galbraith,et al.  Applied Missing Data Analysis by Craig K Enders , 2012 .

[61]  Craig K Enders,et al.  Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. , 2016, Psychological methods.

[62]  James R Carpenter,et al.  Joint modelling rationale for chained equations , 2014, BMC Medical Research Methodology.

[63]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[64]  Ke-Hai Yuan,et al.  Consistency, bias and efficiency of the normal-distribution-based MLE: The role of auxiliary variables , 2014, J. Multivar. Anal..

[65]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[66]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[67]  S. Chinn,et al.  Components of variance and intraclass correlations for the design of community-based surveys and intervention studies: data from the Health Survey for England 1994. , 1999, American journal of epidemiology.

[68]  Andrew J. Martin,et al.  Multilevel Motivation and Engagement: Assessing Construct Validity Across Students and Schools , 2010 .

[69]  Ian R White,et al.  Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods , 2012, BMC Medical Research Methodology.

[70]  Stanislav Kolenikov,et al.  Constrained versus unconstrained estimation in structural equation modeling. , 2008, Psychological methods.

[71]  Stephen W. Raudenbush,et al.  Many Small Groups , 2008 .

[72]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[73]  D Kaplan,et al.  The Impact of Specification Error on the Estimation, Testing, and Improvement of Structural Equation Models. , 1988, Multivariate behavioral research.

[74]  J. Schafer,et al.  Computational Strategies for Multivariate Linear Mixed-Effects Models With Missing Values , 2002 .

[75]  P. Hewson Bayesian Data Analysis 3rd edn A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari and D. B. Rubin, 2013 Boca Raton, Chapman and Hall–CRC 676 pp., £44.99 ISBN 1‐439‐84095‐4 , 2015 .

[76]  Mary Kathryn Cowles,et al.  Accelerating Monte Carlo Markov chain convergence for cumulative-link generalized linear models , 1996, Stat. Comput..

[77]  Michael G. Kenward,et al.  Multiple Imputation and its Application , 2013 .

[78]  Stephen A. Mistler A SAS ® Macro for Applying Multiple Imputation to Multilevel Data , 2013 .

[79]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[80]  J. Hox,et al.  Sufficient Sample Sizes for Multilevel Modeling , 2005 .

[81]  John Aitchison,et al.  Polychotomous quantal response by maximum indicant , 1970 .

[82]  Laura M. Stapleton,et al.  Modeling Clustered Data with Very Few Clusters , 2016, Multivariate behavioral research.

[83]  Harvey Goldstein,et al.  Multilevel models with multivariate mixed response types , 2009 .

[84]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[85]  T. Wills,et al.  The many faces of affect: a multilevel model of drinking frequency/quantity and alcohol dependence symptoms among young adults. , 2014, Journal of abnormal psychology.

[86]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[87]  James R Carpenter,et al.  Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model , 2012, Statistical methods in medical research.

[88]  Alan Agresti,et al.  Categorical Data Analysis, 3rd Edition Extra Exercises , 2012 .

[89]  Daniel McNeish,et al.  On Using Bayesian Methods to Address Small Sample Problems , 2016 .

[90]  Stef van Buuren,et al.  Multiple imputation of discrete and continuous data by fully conditional specification , 2007 .

[91]  Recai M. Yucel,et al.  Multiple imputation inference for multivariate multilevel continuous data with ignorable non-response , 2008, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[92]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[93]  Yongyun Shin,et al.  A Latent Cluster-Mean Approach to the Contextual Effects Model With Missing Data , 2010 .

[94]  Alexander Robitzsch,et al.  Multiple imputation of missing covariate values in multilevel models with random slopes: a cautionary note , 2015, Behavior Research Methods.

[95]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[96]  T. Murdock,et al.  Modeling latent true scores to determine the utility of aggregate student perceptions as classroom indicators in HLM: The case of classroom goal structures , 2007 .