Multiple Imputation for Multilevel Data with Continuous and Binary Variables

We present and compare multiple imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive simulation study motivated by a real dataset comprising multiple studies. The comparisons show that these multiple imputation methods are the most appropriate to handle missing values in a multilevel setting and why their relative performances can vary according to the missing data pattern, the multilevel structure and the type of missing variables. This study shows that valid inferences can only be obtained if the dataset includes a large number of clusters. In addition, it highlights that heteroscedastic multiple imputation methods provide more accurate inferences than homoscedastic methods, which should be reserved for data with few individuals per cluster. Finally, guidelines are given to choose the most suitable multiple imputation method according to the structure of the data.

[1]  J. Brioni,et al.  and Alzheimer's disease , 2010 .

[2]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[3]  A. Gelman,et al.  ON THE STATIONARY DISTRIBUTION OF ITERATIVE IMPUTATIONS , 2010, 1012.2902.

[4]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[5]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[6]  D. Altman,et al.  Missing data , 2007, BMJ : British Medical Journal.

[7]  Karel G M Moons,et al.  Imputation of systematically missing predictors in an individual participant data meta‐analysis: a generalized approach using MICE , 2015, Statistics in medicine.

[8]  Richard D Riley,et al.  Meta‐analysis of continuous outcomes combining individual patient data and aggregate data , 2008, Statistics in medicine.

[9]  Thomas Mathew,et al.  Comparison of One‐Step and Two‐Step Meta‐Analysis Models Using Individual Patient Data , 2010, Biometrical journal. Biometrische Zeitschrift.

[10]  Buuren Stef van Fully Conditional Specification , 2014 .

[11]  William J. Browne MCMC algorithms for constrained variance matrices , 2006, Comput. Stat. Data Anal..

[12]  Craig K Enders,et al.  A Fully Conditional Specification Approach to Multilevel Imputation of Categorical and Continuous Variables , 2018, Psychological methods.

[13]  A closer examination of three small-sample approximations to the multiple-imputation degrees of freedom, erratum , 2011 .

[14]  Jerome P. Reiter,et al.  The importance of modeling the sampling design in multiple imputation for missing data , 2006 .

[15]  J. Nelder,et al.  Generalized Linear Models with Random Effects: Unified Analysis via H-likelihood , 2006 .

[16]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[17]  J. R. Carpenter,et al.  Multiple imputation for IPD meta‐analysis: allowing for heterogeneity and studies with missing covariates , 2015, Statistics in medicine.

[18]  Joerg Drechsler Multiple Imputation of Multilevel Missing Data—Rigor Versus Simplicity , 2015 .

[19]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[20]  Naoki Sato,et al.  Incremental value of biomarkers to clinical variables for mortality prediction in acutely decompensated heart failure: the Multinational Observational Cohort on Acute Heart Failure (MOCA) study. , 2013, International journal of cardiology.

[21]  Recai M. Yucel,et al.  Performance of Sequential Imputation Method in Multilevel Applications , 2009 .

[22]  Recai M Yucel,et al.  Random covariances and mixed-effects models for imputing multivariate multilevel continuous data , 2011, Statistical modelling.

[23]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[24]  R. Little Missing-Data Adjustments in Large Surveys , 1988 .

[25]  G. Shafer,et al.  The Sources of Kolmogorov’s Grundbegriffe , 2006, math/0606533.

[26]  Matthieu Resche-Rigon,et al.  Multiple imputation by chained equations for systematically and sporadically missing multilevel data , 2018, Statistical methods in medical research.

[27]  Johannes B. Reitsma,et al.  Individual Participant Data (IPD) Meta-analyses of Diagnostic and Prognostic Modeling Studies: Guidance on Their Use , 2015, PLoS medicine.

[28]  Orestis Efthimiou,et al.  Get real in individual participant data (IPD) meta‐analysis: a review of the methodology , 2015, Research synthesis methods.

[29]  Abraham De Moivre De mensura sortis, seu, de probabilitate eventuum in ludis a casu fortuito pendentibus , 1710, Philosophical Transactions of the Royal Society of London.

[30]  C. Kronauer [On closer examination]. , 2000, Schweizerische medizinische Wochenschrift.

[31]  H. Seal Studies in the history of probability and statistics , 1977 .

[32]  Andrea M Hussong,et al.  Integrative data analysis: the simultaneous analysis of multiple data sets. , 2009, Psychological methods.

[33]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[34]  Andrew Gelman,et al.  Multiple Imputation for Continuous and Categorical Data: Comparing Joint Multivariate Normal and Conditional Approaches , 2014, Political Analysis.

[35]  O. Harel,et al.  A Closer Examination of Three Small-Sample Approximations to the Multiple-Imputation Degrees of Freedom , 2011 .

[36]  John B Carlin,et al.  Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. , 2010, American journal of epidemiology.

[37]  Matthieu Resche-Rigon,et al.  Multiple imputation for handling systematically missing confounders in meta‐analysis of individual participant data , 2013, Statistics in medicine.

[38]  Dimitris Rizopoulos,et al.  Dealing with missing covariates in epidemiologic studies: a comparison between multiple imputation and a full Bayesian approach , 2016, Statistics in medicine.

[39]  Harvey Goldstein,et al.  Multilevel Structural Equation Models for the Analysis of Comparative Data on Educational Performance , 2007 .

[40]  Richard D Riley,et al.  A matrix-based method of moments for fitting the multivariate random effects model for meta-analysis and meta-regression , 2013, Biometrical journal. Biometrische Zeitschrift.

[41]  G. Van den Berghe,et al.  Association between elevated blood glucose and outcome in acute heart failure: results from an international observational cohort. , 2013, Journal of the American College of Cardiology.

[42]  J. Guillaumin Boethius’s De institutione arithmetica and its Influence on Posterity , 2012 .

[43]  Norman Biggs,et al.  The roots of combinatorics , 1979 .

[44]  Qi Long,et al.  Multiple imputation in the presence of high-dimensional data , 2016, Statistical methods in medical research.

[45]  Hildegard Schaeper,et al.  The German National Educational Panel Study (NEPS) , 2013 .

[46]  Alexander Robitzsch,et al.  Multiple imputation of missing covariate values in multilevel models with random slopes: a cautionary note , 2015, Behavior Research Methods.

[47]  T. Raghunathan,et al.  Convergence Properties of a Sequential Regression Multiple Imputation Algorithm , 2015 .

[48]  J. Schafer,et al.  Computational Strategies for Multivariate Linear Mixed-Effects Models With Missing Values , 2002 .

[49]  Mark C Simmonds,et al.  Meta-analysis of individual patient data from randomized trials: a review of methods used in practice , 2005, Clinical trials.

[50]  Maengseok Noh,et al.  REML estimation for binary data in GLMMs , 2007 .

[51]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[52]  R. Kass,et al.  Reference Bayesian Methods for Generalized Linear Mixed Models , 2000 .

[53]  Michael G. Kenward,et al.  Multiple Imputation and its Application , 2013 .

[54]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[55]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[56]  Robert E. Fay [Multiple-Imputation Inferences with Uncongenial Sources of Input]: Comment , 1994 .

[57]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[59]  Craig K Enders,et al.  Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. , 2016, Psychological methods.

[60]  Knut Schwippert,et al.  Erste Ergebnisse aus IGLU: Schülerleistungen am Ende der vierten Jahrgangsstufe im internationalen Vergleich , 2003 .

[61]  Tihomir Asparouhov,et al.  Multiple Imputation with Mplus , 2010 .

[62]  Craig K. Enders,et al.  Applied Missing Data Analysis. Methodology in the Social Sciences Series. , 2010 .

[63]  Wenjing Huang,et al.  Pooling data from multiple longitudinal studies: the role of item response theory in integrative data analysis. , 2008, Developmental psychology.

[64]  John W. Graham,et al.  Missing Data: Analysis and Design , 2012 .

[65]  S. Buuren,et al.  Partitioned predictive mean matching as a multilevel imputation technique , 2015 .

[66]  Jerome P. Reiter,et al.  A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets , 2015, Multivariate behavioral research.

[67]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[68]  Dean Langan,et al.  Comparative performance of heterogeneity variance estimators in meta‐analysis: a review of simulation studies , 2016, Research synthesis methods.

[69]  E. Sylla Business Ethics, Commercial Mathematics, and the Origins of Mathematical Probability , 2004 .

[70]  James R Carpenter,et al.  Joint modelling rationale for chained equations , 2014, BMC Medical Research Methodology.

[71]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[72]  Michael J Crowther,et al.  Using simulation studies to evaluate statistical methods , 2017, Statistics in medicine.

[73]  Laura M. Stapleton,et al.  Modeling Clustered Data with Very Few Clusters , 2016, Multivariate behavioral research.

[74]  Harvey Goldstein,et al.  Multilevel models with multivariate mixed response types , 2009 .

[75]  Sabrina Eberhart,et al.  Applied Missing Data Analysis , 2016 .

[76]  S. Jolani Hierarchical imputation of systematically and sporadically missing data: An approximate Bayesian approach using chained equations , 2018, Biometrical journal. Biometrische Zeitschrift.

[77]  Stef van Buuren,et al.  Partioned predictive mean matching as a large data multilevel imputation technique. , 2015 .

[78]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[79]  Stef van Buuren,et al.  Multiple imputation of discrete and continuous data by fully conditional specification , 2007 .

[80]  Akimichi Takemura,et al.  Lévy’s Zero–One Law in Game-Theoretic Probability , 2009, Journal of Theoretical Probability.

[81]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[82]  J. Schafer,et al.  Analysis of Incomplete Multivariate Data (Monographs on Statistics and Applied Probability, No. 72) , 2000 .

[83]  Christian P. Robert,et al.  The Bayesian choice : from decision-theoretic foundations to computational implementation , 2007 .

[84]  S. Stigler Soft Questions, Hard Answers: Jacob Bernoulli's Probability in Historical Context , 2014 .

[85]  N. Laird,et al.  Meta-analysis in clinical trials. , 1986, Controlled clinical trials.

[86]  D. Bates,et al.  Mixed-Effects Models in S and S-PLUS , 2001 .

[87]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[88]  Rebecca R Andridge,et al.  Quantifying the impact of fixed effects modeling of clusters in multiple imputation for cluster randomized trials , 2011, Biometrical journal. Biometrische Zeitschrift.

[89]  James R Carpenter,et al.  Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model , 2012, Statistical methods in medical research.

[90]  S. van Buuren,et al.  Multiple Imputation of Multilevel Data , 2006 .

[91]  Richard D Riley,et al.  External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges , 2016, BMJ.

[92]  Eloise E Kaizar,et al.  A comparison of existing methods for multiple imputation in individual participant data meta‐analysis , 2017, Statistics in medicine.

[93]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .