Missing data analysis using multiple imputation: getting to the heart of the matter.

Missing data are a pervasive problem in health investigations. We describe some background of missing data analysis and criticize ad hoc methods that are prone to serious problems. We then focus on multiple imputation, in which missing cases are first filled in by several sets of plausible values to create multiple completed datasets, then standard complete-data procedures are applied to each completed dataset, and finally the multiple sets of results are combined to yield a single inference. We introduce the basic concepts and general methodology and provide some guidance for application. For illustration, we use a study assessing the effect of cardiovascular diseases on hospice discussion for late stage lung cancer patients.

[1]  J. Schafer Multiple Imputation in Multivariate Problems When the Imputation and Analysis Models Differ , 2003 .

[2]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.

[3]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[4]  D. Rubin,et al.  MULTIPLE IMPUTATIONS IN SAMPLE SURVEYS-A PHENOMENOLOGICAL BAYESIAN APPROACH TO NONRESPONSE , 2002 .

[5]  T. Raghunathan,et al.  Multiple Imputation of Missing Income Data in the National Health Interview Survey , 2006 .

[6]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[7]  A. Rotnitzky,et al.  Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis by DANIELS, M. J. and HOGAN, J. W , 2009 .

[8]  Ken P Kleinman,et al.  Much Ado About Nothing , 2007, The American statistician.

[9]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[10]  Nicholas J. Horton,et al.  A Potential for Bias When Rounding in Multiple Imputation , 2003 .

[11]  Geert Verbeke,et al.  Multiple Imputation for Model Checking: Completed‐Data Plots with Missing and Latent Data , 2005, Biometrics.

[12]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[13]  J. Schafer,et al.  Computational Strategies for Multivariate Linear Mixed-Effects Models With Missing Values , 2002 .

[14]  Alan M. Zaslavsky,et al.  Using Calibration to Improve Rounding in Imputation , 2008 .

[15]  M. Kenward,et al.  A comparison of multiple imputation and doubly robust estimation for analyses with missing data , 2006 .

[16]  J M Taylor,et al.  Multiple Imputation and Posterior Simulation for Multivariate Missing Data in Longitudinal Studies , 2000, Biometrics.

[17]  Andrew Gelman,et al.  Diagnostics for multivariate imputations , 2007 .

[18]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[19]  T. Belin,et al.  Using Multiple Imputation to Incorporate Cases with Missing Items in a Mental Health Services Study , 2000, Health Services and Outcomes Research Methodology.

[20]  Geert Molenberghs,et al.  Missing Data in Clinical Studies , 2007 .

[21]  Oliver Rivero-Arias,et al.  Evaluation of software for multiple imputation of semi-continuous data , 2007, Statistical methods in medical research.

[22]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[23]  Andrew Gelman,et al.  Applied Bayesian Modeling And Causal Inference From Incomplete-Data Perspectives , 2005 .

[24]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[25]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[26]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[27]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[28]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[29]  Roderick J. A. Little,et al.  Multiple Imputation for the Fatal Accident Reporting System , 1991 .

[30]  Jeremy MG Taylor,et al.  Partially parametric techniques for multiple imputation , 1996 .

[31]  Jerome P. Reiter,et al.  The Multiple Adaptations of Multiple Imputation , 2007 .

[32]  K. Kahn,et al.  Discussions with physicians about hospice among patients with metastatic lung cancer. , 2009, Archives of internal medicine.

[33]  A. Zaslavsky,et al.  Multiple imputation in a large-scale complex survey: a practical guide , 2010, Statistical methods in medical research.

[34]  N Schenker,et al.  Analyses of public use decennial census data with multiply imputed industry and occupation codes. , 1993, Journal of the Royal Statistical Society. Series C, Applied statistics.

[35]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[36]  A. Winsor Sampling techniques. , 2000, Nursing times.

[37]  Tapabrata Maiti,et al.  Bayesian Data Analysis (2nd ed.) (Book) , 2004 .

[38]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[39]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[40]  K. Kahn,et al.  Understanding cancer treatment and outcomes: the Cancer Care Outcomes Research and Surveillance Consortium. , 2004, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.