Bias and Precision of the "Multiple Imputation, Then Deletion" Method for Dealing With Missing Outcome Data.

Multiple imputation (MI) is increasingly being used to handle missing data in epidemiologic research. When data on both the exposure and the outcome are missing, an alternative to standard MI is the "multiple imputation, then deletion" (MID) method, which involves deleting imputed outcomes prior to analysis. While MID has been shown to provide efficiency gains over standard MI when analysis and imputation models are the same, the performance of MID in the presence of auxiliary variables for the incomplete outcome is not well understood. Using simulated data, we evaluated the performance of standard MI and MID in regression settings where data were missing on both the outcome and the exposure and where an auxiliary variable associated with the incomplete outcome was included in the imputation model. When the auxiliary variable was unrelated to missingness in the outcome, both standard MI and MID produced negligible bias when estimating regression parameters, with standard MI being more efficient in most settings. However, when the auxiliary variable was also associated with missingness in the outcome, alarmingly MID produced markedly biased parameter estimates. On the basis of these results, we recommend that researchers use standard MI rather than MID in the presence of auxiliary variables associated with an incomplete outcome.

[1]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[2]  E. Brunner,et al.  Inequalities in self-rated health in Japan 1986–2007 according to household income and a novel occupational classification: national sampling survey series , 2013, Journal of Epidemiology & Community Health.

[3]  R. Little,et al.  The prevention and treatment of missing data in clinical trials. , 2012, The New England journal of medicine.

[4]  Joseph L Schafer,et al.  Robustness of a multivariate normal approximation for imputation of incomplete binary data , 2007, Statistics in medicine.

[5]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[6]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[7]  Paul T. von Hippel Regression with missing Ys: An improved strategy for analyzing multiply imputed data , 2007 .

[8]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.

[9]  Megan Andrew,et al.  Longitudinal associations between poverty and obesity from birth through adolescence. , 2014, American journal of public health.

[10]  Sarah A. Mustillo The Effects of Auxiliary Variables on Coefficient Bias and Efficiency in Multiple Imputation , 2012 .

[11]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[12]  P. de Jonge,et al.  Self-reported depressive symptoms, diagnosed clinical depression and cardiac morbidity and mortality after myocardial infarction. , 2013, International journal of cardiology.

[13]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[14]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[15]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[16]  John B. Carlin,et al.  Bias and efficiency of multiple imputation compared with complete‐case analysis for missing covariate values , 2010, Statistics in medicine.

[17]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[18]  J L Schafer,et al.  Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective. , 1998, Multivariate behavioral research.

[19]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[20]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[21]  John W. Graham,et al.  Missing Data: Analysis and Design , 2012 .

[22]  John B Carlin,et al.  Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. , 2010, American journal of epidemiology.

[23]  I. Wilkinson,et al.  Arterial Stiffness, Physical Function, and Functional Limitation: The Whitehall II Study , 2011, Hypertension.

[24]  P. Muennig,et al.  Anti-gay prejudice and all-cause mortality among heterosexuals in the United States. , 2014, American journal of public health.

[25]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[26]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[27]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[28]  Traci D. Jarrett,et al.  Effects of Physical Activity on Teen Smoking Cessation , 2011, Pediatrics.

[29]  Ewout W Steyerberg,et al.  Predicting asthma in preschool children with asthma-like symptoms: validating and updating the PIAMA risk score. , 2013, The Journal of allergy and clinical immunology.

[30]  John B Carlin,et al.  Recovery of information from multiple imputation: a simulation study , 2012, Emerging Themes in Epidemiology.

[31]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .