Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box

Our mi package in R has several features that allow the user to get inside the imputation process and evaluate the reasonableness of the resulting models and imputations. These features include: choice of predictors, models, and transformations for chained imputation models; standard and binned residual plots for checking the fit of the conditional distributions used for imputation; and plots for comparing the distributions of observed and imputed data. In addition, we use Bayesian models and weakly informative prior distributions to construct more stable estimates of imputation models. Our goal is to have a demonstration package that (a) avoids many of the practical problems that arise with existing multivariate imputation programs, and (b) demonstrates state-of-the-art diagnostics that can be applied more generally and can be incorporated into the software of others.

[1]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[2]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[3]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[4]  Roderick J. A. Little,et al.  Multiple Imputation for the Fatal Accident Reporting System , 1991 .

[5]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[6]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[7]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[8]  Jeremy MG Taylor,et al.  Partially parametric techniques for multiple imputation , 1996 .

[9]  R. Fay Alternative Paradigms for the Analysis of Imputed Survey Data , 1996 .

[10]  A. Gelman,et al.  Not Asked and Not Answered: Multiple Imputation for Multiple Surveys , 1998 .

[11]  S. van Buuren,et al.  Multivariate Imputation by Chained Equations : Mice V1.0 User's manual , 2000 .

[12]  J. Robins,et al.  Inference for imputation estimators , 2000 .

[13]  Francis Tuerlinckx,et al.  Diagnostic checks for discrete data regression models using posterior predictive simulations , 2000 .

[14]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[15]  Brian D. Ripley,et al.  Modern applied statistics with S, 4th Edition , 2002, Statistics and computing.

[16]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[17]  Geert Verbeke,et al.  Multiple Imputation for Model Checking: Completed‐Data Plots with Missing and Latent Data , 2005, Biometrics.

[18]  Andrew Gelman,et al.  R2WinBUGS: A Package for Running WinBUGS from R , 2005 .

[19]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[20]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[21]  M. Chiasson,et al.  Antiretroviral therapy and declining AIDS mortality in New York City , 2000, Journal of Urban Health.

[22]  Andrew Gelman,et al.  Diagnostics for multivariate imputations , 2007 .

[23]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[24]  Sanford Weisberg,et al.  An R Companion to Applied Regression , 2010 .

[25]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[26]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .