A comparison of multiple imputation methods for handling missing values in longitudinal data in the presence of a time-varying covariate with a non-linear association with time: a simulation study

BackgroundMissing data is a common problem in epidemiological studies, and is particularly prominent in longitudinal data, which involve multiple waves of data collection. Traditional multiple imputation (MI) methods (fully conditional specification (FCS) and multivariate normal imputation (MVNI)) treat repeated measurements of the same time-dependent variable as just another ‘distinct’ variable for imputation and therefore do not make the most of the longitudinal structure of the data. Only a few studies have explored extensions to the standard approaches to account for the temporal structure of longitudinal data. One suggestion is the two-fold fully conditional specification (two-fold FCS) algorithm, which restricts the imputation of a time-dependent variable to time blocks where the imputation model includes measurements taken at the specified and adjacent times. To date, no study has investigated the performance of two-fold FCS and standard MI methods for handling missing data in a time-varying covariate with a non-linear trajectory over time – a commonly encountered scenario in epidemiological studies.MethodsWe simulated 1000 datasets of 5000 individuals based on the Longitudinal Study of Australian Children (LSAC). Three missing data mechanisms: missing completely at random (MCAR), and a weak and a strong missing at random (MAR) scenarios were used to impose missingness on body mass index (BMI) for age z-scores; a continuous time-varying exposure variable with a non-linear trajectory over time. We evaluated the performance of FCS, MVNI, and two-fold FCS for handling up to 50% of missing data when assessing the association between childhood obesity and sleep problems.ResultsThe standard two-fold FCS produced slightly more biased and less precise estimates than FCS and MVNI. We observed slight improvements in bias and precision when using a time window width of two for the two-fold FCS algorithm compared to the standard width of one.ConclusionWe recommend the use of FCS or MVNI in a similar longitudinal setting, and when encountering convergence issues due to a large number of time points or variables with missing values, the two-fold FCS with exploration of a suitable time window.

[1]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[2]  N. Halfon,et al.  Lifecourse Health Development: Past, Present and Future , 2013, Maternal and Child Health Journal.

[3]  Theo Stijnen,et al.  Using the outcome for imputation of missing predictor values was preferred. , 2006, Journal of clinical epidemiology.

[4]  Douglas G Altman,et al.  Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study , 2010, BMC medical research methodology.

[5]  Reweighting estimators for Cox regression with missing covariate data: Analysis of insulin resistance and risk of stroke in the Northern Manhattan Study , 2011, Statistics in medicine.

[6]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[7]  Ian R White,et al.  Comparison of imputation and modelling methods in the analysis of a physical activity trial with missing outcomes. , 2004, International journal of epidemiology.

[8]  S. Ripatti,et al.  Missing value imputation in longitudinal measures of alcohol consumption , 2011, International journal of methods in psychiatric research.

[9]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[10]  Ian R. White,et al.  Simsum: Analyses of Simulation Studies Including Monte Carlo Error , 2010 .

[11]  N. Craddock,et al.  The relationship between childhood depressive symptoms and problem alcohol use in early adolescence: findings from a large longitudinal population-based study. , 2012, Addiction.

[12]  Katherine J. Lee,et al.  The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: a simulation study , 2013, Emerging Themes in Epidemiology.

[13]  Robert J Glynn,et al.  Bias due to missing exposure data using complete‐case analysis in the proportional hazards regression model , 2003, Statistics in medicine.

[14]  Irene Petersen,et al.  Application of Multiple Imputation using the Two-Fold Fully Conditional Specification Algorithm in Longitudinal Clinical Data , 2014, The Stata journal.

[15]  James R Carpenter,et al.  Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model , 2012, Statistical methods in medical research.

[16]  Michael G. Kenward,et al.  Multiple Imputation and its Application: Carpenter/Multiple Imputation and its Application , 2013 .

[17]  S. Shioda,et al.  Characterization of mouse GBP28 and its induction by exposure to cold , 2001, International Journal of Obesity.

[18]  L. Hunt,et al.  Missing Data in Clinical Studies , 2007 .

[19]  C. Ebbeling,et al.  Childhood obesity: public-health crisis, common sense cure , 2002, The Lancet.

[20]  M. Beydoun,et al.  Is Sleep Duration Associated With Childhood Obesity? A Systematic Review and Meta‐analysis , 2008, Obesity.

[21]  Katherine J. Lee,et al.  The rise of multiple imputation: a review of the reporting and implementation of the method in medical research , 2015, BMC Medical Research Methodology.

[22]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[23]  John B Carlin,et al.  American Journal of Epidemiology Practice of Epidemiology Multiple Imputation for Missing Data: Fully Conditional Specification versus Multivariate Normal Imputation , 2022 .

[24]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[25]  Patrick Royston,et al.  The design of simulation studies in medical statistics , 2006, Statistics in medicine.

[26]  R. Lerner,et al.  Why Missing Data Matter in the Longitudinal Study of Adolescent Development: Using the 4-H Study to Understand the Uses of Different Missing Data Methods , 2010, Journal of youth and adolescence.

[27]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[28]  A. Copas,et al.  A comparison of multiple‐imputation methods for handling missing data in repeated measurements observational studies , 2016 .

[29]  Gail M. Williams,et al.  Do childhood sleeping problems predict obesity in young adulthood? Evidence from a prospective birth cohort study. , 2007, American journal of epidemiology.

[30]  Linda M. Collins,et al.  Using Modern Missing Data Methods with Auxiliary Variables to Mitigate the Effects of Attrition on Statistical Power , 2012 .

[31]  Yunlong Wang Chapter 2 Use of Percentiles and Z-Scores in Anthropometry , 2011 .

[32]  M. Wake,et al.  Sleep duration and body mass index in 0–7-year olds , 2011, Archives of Disease in Childhood.

[33]  Jaakko Nevalainen,et al.  Missing values in longitudinal dietary data: A multiple imputation approach based on a fully conditional specification , 2009, Statistics in medicine.

[34]  Oliver Rivero-Arias,et al.  Evaluation of software for multiple imputation of semi-continuous data , 2007, Statistical methods in medical research.

[35]  Enayet Talukder,et al.  Analysis of Longitudinal Binary Data with Missing Data Due to Dropouts , 2005, Journal of biopharmaceutical statistics.

[36]  John B Carlin,et al.  A review of the reporting and handling of missing data in cohort studies with repeated assessment of exposure measures , 2012, BMC Medical Research Methodology.

[37]  J. Kelly,et al.  Adenotonsillectomy for Obstructive Sleep Apnea in Obese Children , 2004, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[38]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[39]  M Chavance,et al.  Sensitivity analysis of incomplete longitudinal data departing from the missing at random assumption: Methodology and application in a clinical trial with drop-outs , 2016, Statistical methods in medical research.

[40]  Michael G. Kenward,et al.  Multiple Imputation and its Application , 2013 .

[41]  J. Kratzsch,et al.  Clinical aspects of obesity in childhood and adolescence—diagnosis, treatment and prevention , 2001, International Journal of Obesity.

[42]  T. Cole,et al.  Statistical issues in life course epidemiology. , 2006, American journal of epidemiology.

[43]  Ian R White,et al.  Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data , 2014, Statistics in medicine.

[44]  Qingxia Chen,et al.  Missing covariate data in medical research: to impute is better than to ignore. , 2010, Journal of clinical epidemiology.

[45]  J. Hippisley-Cox,et al.  Exposure to statins and risk of common cancers: a series of nested case-control studies , 2011, BMC Cancer.

[46]  Julie A Simpson,et al.  Introduction to multiple imputation for dealing with missing data , 2014, Respirology.

[47]  E. Ornstein,et al.  The Potential and the Pitfalls , 2008 .

[48]  Z. Aitken,et al.  Introduction to causal diagrams for confounder selection , 2014, Respirology.

[49]  Emily K. Snell,et al.  Sleep and the body mass index and overweight status of children and adolescents. , 2007, Child development.

[50]  M. Delgado-Rodríguez,et al.  Systematic review and meta-analysis. , 2017, Medicina intensiva.

[51]  Rebekah H Nagler,et al.  Seeking Cancer-Related Information From Media and Family/Friends Increases Fruit and Vegetable Consumption Among Cancer Patients , 2012, Health communication.

[52]  Geert Molenberghs,et al.  Analyzing incomplete longitudinal clinical trial data. , 2004, Biostatistics.

[53]  Geert Molenberghs,et al.  Direct likelihood analysis versus simple forms of imputation for missing data in randomized clinical trials , 2005, Clinical trials.

[54]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.