Outcome-sensitive multiple imputation: a simulation study

BackgroundMultiple imputation is frequently used to deal with missing data in healthcare research. Although it is known that the outcome should be included in the imputation model when imputing missing covariate values, it is not known whether it should be imputed. Similarly no clear recommendations exist on: the utility of incorporating a secondary outcome, if available, in the imputation model; the level of protection offered when data are missing not-at-random; the implications of the dataset size and missingness levels.MethodsWe used realistic assumptions to generate thousands of datasets across a broad spectrum of contexts: three mechanisms of missingness (completely at random; at random; not at random); varying extents of missingness (20–80% missing data); and different sample sizes (1,000 or 10,000 cases). For each context we quantified the performance of a complete case analysis and seven multiple imputation methods which deleted cases with missing outcome before imputation, after imputation or not at all; included or did not include the outcome in the imputation models; and included or did not include a secondary outcome in the imputation models. Methods were compared on mean absolute error, bias, coverage and power over 1,000 datasets for each scenario.ResultsOverall, there was very little to separate multiple imputation methods which included the outcome in the imputation model. Even when missingness was quite extensive, all multiple imputation approaches performed well. Incorporating a secondary outcome, moderately correlated with the outcome of interest, made very little difference. The dataset size and the extent of missingness affected performance, as expected. Multiple imputation methods protected less well against missingness not at random, but did offer some protection.ConclusionsAs long as the outcome is included in the imputation model, there are very small performance differences between the possible multiple imputation approaches: no outcome imputation, imputation or imputation and deletion. All informative covariates, even with very high levels of missingness, should be included in the multiple imputation model. Multiple imputation offers some protection against a simple missing not at random mechanism.

[1]  I. White,et al.  Eliciting and using expert opinions about dropout bias in randomized controlled trials , 2007, Clinical trials.

[2]  A. Luong,et al.  Employee attitude surveys: examining the attitudes of noncompliant employees. , 2000, The Journal of applied psychology.

[3]  Melanie L Bell,et al.  Handling missing data in RCTs; a review of the top medical journals , 2014, BMC Medical Research Methodology.

[4]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[5]  Theo Stijnen,et al.  Using the outcome for imputation of missing predictor values was preferred. , 2006, Journal of clinical epidemiology.

[6]  Thomas R Sullivan,et al.  Bias and Precision of the "Multiple Imputation, Then Deletion" Method for Dealing With Missing Outcome Data. , 2015, American journal of epidemiology.

[7]  Evangelos Kontopantelis,et al.  Simulation-Based Power Calculations for Mixed Effects Modeling: ipdpower in Stata , 2016 .

[8]  Nianbo Dong,et al.  The Incredible Years Teacher Classroom Management Program: Outcomes from a Group Randomized Trial , 2018, Prevention Science.

[9]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[10]  D. A. van der Windt,et al.  Metabolic risk factors and the incidence and progression of radiographic hand osteoarthritis: a population-based cohort study , 2018, Scandinavian journal of rheumatology.

[11]  Qingxia Chen,et al.  Missing covariate data in medical research: to impute is better than to ignore. , 2010, Journal of clinical epidemiology.

[12]  J. Robins,et al.  Inference for imputation estimators , 2000 .

[13]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[14]  K. W. Drossaers-Bakker,et al.  Trajectories of Physical Work Capacity in Early Symptomatic Osteoarthritis of Hip and Knee: Results from the Cohort Hip and Cohort Knee (CHECK) Study , 2018, Journal of Occupational Rehabilitation.

[15]  Paul T. von Hippel Regression with missing Ys: An improved strategy for analyzing multiply imputed data , 2007 .

[16]  John B. Carlin,et al.  Bias and efficiency of multiple imputation compared with complete‐case analysis for missing covariate values , 2010, Statistics in medicine.

[17]  Evangelos Kontopantelis,et al.  Performance of statistical methods for meta-analysis when true study effects are non-normally distributed: A simulation study , 2012, Statistical methods in medical research.

[18]  A Rogier T Donders,et al.  Dealing with missing outcome data in randomized trials and observational studies. , 2012, American journal of epidemiology.

[19]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[20]  Liqiu Jiang,et al.  Multiple Imputation Approaches for the Analysis of Dichotomized Responses in Longitudinal Studies with Missing Data , 2010, Biometrics.

[21]  I. Buchan,et al.  Changes in Arterial Access Site and Association With Mortality in the United Kingdom: Observations From a National Percutaneous Coronary Intervention Database. , 2016, Circulation.