A comparison of multiple imputation methods for missing data in longitudinal studies

BackgroundMultiple imputation (MI) is now widely used to handle missing data in longitudinal studies. Several MI techniques have been proposed to impute incomplete longitudinal covariates, including standard fully conditional specification (FCS-Standard) and joint multivariate normal imputation (JM-MVN), which treat repeated measurements as distinct variables, and various extensions based on generalized linear mixed models. Although these MI approaches have been implemented in various software packages, there has not been a comprehensive evaluation of the relative performance of these methods in the context of longitudinal data.MethodUsing both empirical data and a simulation study based on data from the six waves of the Longitudinal Study of Australian Children (N = 4661), we investigated the performance of a wide range of MI methods available in standard software packages for investigating the association between child body mass index (BMI) and quality of life using both a linear regression and a linear mixed-effects model.ResultsIn this paper, we have identified and compared 12 different MI methods for imputing missing data in longitudinal studies. Analysis of simulated data under missing at random (MAR) mechanisms showed that the generally available MI methods provided less biased estimates with better coverage for the linear regression model and around half of these methods performed well for the estimation of regression parameters for a linear mixed model with random intercept. With the observed data, we observed an inverse association between child BMI and quality of life, with available data as well as multiple imputation.ConclusionBoth FCS-Standard and JM-MVN performed well for the estimation of regression parameters in both analysis models. More complex methods that explicitly reflect the longitudinal structure for these analysis models may only be needed in specific circumstances such as irregularly spaced data.

[1]  Jaakko Nevalainen,et al.  Missing values in longitudinal dietary data: A multiple imputation approach based on a fully conditional specification , 2009, Statistics in medicine.

[2]  James R Carpenter,et al.  Joint modelling rationale for chained equations , 2014, BMC Medical Research Methodology.

[3]  J. Ziviani,et al.  Speech and Language Difficulties Along with Other Child and Family Factors Associated with Health Related Quality of Life of Australian Children , 2016 .

[4]  Shaun R Seaman,et al.  Relative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: The general location model , 2016, Statistical methods in medical research.

[5]  Harvey Goldstein,et al.  Multilevel models with multivariate mixed response types , 2009 .

[6]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[7]  Recai M. Yucel,et al.  Performance of Sequential Imputation Method in Multilevel Applications , 2009 .

[8]  Alexander Robitzsch,et al.  Some Additional Multiple Imputation Functions, Especially for'mice' , 2015 .

[9]  Matthieu Resche-Rigon,et al.  Multiple imputation by chained equations for systematically and sporadically missing multilevel data , 2018, Statistical methods in medical research.

[10]  N M Laird,et al.  Missing data in longitudinal studies. , 1988, Statistics in medicine.

[11]  M. Wake,et al.  Bidirectional associations between overweight and health-related quality of life from 4–11 years: Longitudinal Study of Australian Children , 2013, International Journal of Obesity.

[12]  Harvey Goldstein,et al.  REALCOM-IMPUTE Software for Multilevel Multiple Imputation with Mixed Response Types , 2011 .

[13]  Qi Long,et al.  Multiple imputation in the presence of high-dimensional data , 2016, Statistical methods in medical research.

[14]  T. Cole,et al.  Establishing a standard definition for child overweight and obesity worldwide: international survey , 2000, BMJ : British Medical Journal.

[15]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[16]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[17]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[18]  Anurika Priyanjali De Silva,et al.  A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study , 2017, BMC Medical Research Methodology.

[19]  J. Schafer,et al.  Computational Strategies for Multivariate Linear Mixed-Effects Models With Missing Values , 2002 .

[20]  Joseph L Schafer,et al.  Robustness of a multivariate normal approximation for imputation of incomplete binary data , 2007, Statistics in medicine.

[21]  Patrick Royston,et al.  Tuning multiple imputation by predictive mean matching and local residual draws , 2014, BMC Medical Research Methodology.

[22]  A. Copas,et al.  A comparison of multiple‐imputation methods for handling missing data in repeated measurements observational studies , 2016 .

[23]  M. Wake,et al.  Bi-directional longitudinal associations between overweight and health-related quality of life from 4–11years. Longitudinal study of Australian children. , 2014, Appetite.

[24]  Craig K Enders,et al.  A Fully Conditional Specification Approach to Multilevel Imputation of Categorical and Continuous Variables , 2018, Psychological methods.

[25]  J. R. Carpenter,et al.  Multiple imputation for IPD meta‐analysis: allowing for heterogeneity and studies with missing covariates , 2015, Statistics in medicine.

[26]  Ian R White,et al.  Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data , 2014, Statistics in medicine.

[27]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[28]  Jared S. Murray,et al.  Multiple Imputation: A Review of Practical and Theoretical Findings , 2018, 1801.04058.

[29]  J. Ware,et al.  Applied Longitudinal Analysis , 2004 .

[30]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[31]  Panteha Hayati Rezvan,et al.  A review of the reporting and implementation of multiple imputation in medical research , 2015 .

[32]  John B Carlin,et al.  Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. , 2010, American journal of epidemiology.

[33]  Nicholas J. Horton,et al.  A Potential for Bias When Rounding in Multiple Imputation , 2003 .

[34]  Vincent Audigier,et al.  Multiple Imputation for Multilevel Data with Continuous and Binary Variables , 2017, 1702.00971.

[35]  P. Diggle,et al.  Analysis of Longitudinal Data. , 1997 .

[36]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[37]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[38]  Craig K Enders,et al.  Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. , 2016, Psychological methods.

[39]  Julie Josse,et al.  Multiple imputation for continuous variables using a Bayesian principal component analysis† , 2014, 1401.5747.

[40]  S. van Buuren,et al.  Multiple Imputation of Multilevel Data , 2006 .

[41]  Alexander Robitzsch,et al.  Multiple Imputation of Missing Data in Multilevel Designs: A Comparison of Different Strategies , 2017, Psychological methods.