Multiple Imputation of Missing Data: A Simulation Study on a Binary Response

Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multiple Imputation by Chained Equations (MICE). They have been shown to work well in large samples or when only small proportions of missing data are to be imputed. However, some researchers have begun to impute large proportions of missing data or to apply the method to small samples. A simulation was performed using MICE on datasets with 50, 100 or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%) was set as missing completely at random and subsequently substituted using multiple imputation by chained equations. In a logistic regression model, four coefficients, i.e. non-zero and zero main effects as well as non-zero and zero interaction effects were examined. Estimations of all main and interaction effects were unbiased. There was a considerable variance in the estimates, increasing with the proportion of missing data and decreasing with sample size. The imputation of missing data by chained equations is a useful tool for imputing small to moderate proportions of missing data. The method has its limits, however. In small samples, there are considerable random errors for all effects.

[1]  Paul S Albert,et al.  Imputation Approaches for Estimating Diagnostic Accuracy for Multiple Tests from Partially Verified Designs , 2007, Biometrics.

[2]  G. Vossel,et al.  Stabilität und Stimmungsabhängigkeit retrospektiver Berichte elterlichen Erziehungsverhaltens , 2007 .

[3]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[4]  P. Royston,et al.  Patrick Royston model with a binary outcome A comparison of imputation techniques for handling missing predictor values in a risk , 2007 .

[5]  Chris J. Skinner Comment on X-L Meng, ‘multiple-imputation inferences with uncongenial sources of input’ , 1994 .

[6]  Patrick Royston,et al.  Multiple Imputation of Missing Values: Update , 2005 .

[7]  Andrea Voß-Frick Klinische Interviews und Ratingskalen , 2005 .

[8]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[9]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[10]  Singuläre und multiple Imputation fehlender Einkommenswerte: ein empirischer Vergleich , 1998 .

[11]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[12]  Thomas R Belin,et al.  Multiple imputation using an iterative hot‐deck with distance‐based donor selection , 2008, Statistics in medicine.

[13]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.

[14]  J. Hardt,et al.  Suicide attempts and retrospective reports about parent-child relationships: evidence for the affectionless control hypothesis , 2007, Psycho-social medicine.

[15]  L. Ried,et al.  Missing data on the Center for Epidemiologic Studies Depression Scale: a comparison of 4 imputation techniques. , 2007, Research in social & administrative pharmacy : RSAP.

[16]  A. Gelman,et al.  Not Asked and Not Answered: Multiple Imputation for Multiple Surveys , 1998 .

[17]  Purna Mukhopadhyay,et al.  Multiple Imputation of Missing Data Using SAS , 2015 .

[18]  Ken P Kleinman,et al.  Much Ado About Nothing , 2007, The American statistician.

[19]  Craig K. Enders,et al.  Applied Missing Data Analysis , 2010 .

[20]  John W Seaman,et al.  Multiple imputation techniques in small sample clinical trials , 2006, Statistics in medicine.

[21]  Rainer Leonhart,et al.  Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research , 2012, BMC Medical Research Methodology.

[22]  J. Graham,et al.  Analysis with missing data in drug prevention research. , 1994, NIDA research monograph.

[23]  Douglas G Altman,et al.  Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines , 2009, BMC medical research methodology.

[24]  Douglas G Altman,et al.  Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study , 2010, BMC medical research methodology.

[25]  A Rogier T Donders,et al.  Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. , 2006, Journal of clinical epidemiology.

[26]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[27]  Roderick J Little,et al.  A hot‐deck multiple imputation procedure for gaps in longitudinal data on recurrent events , 2008, Statistics in Medicine.

[28]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[29]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[30]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[31]  Yueheng An Smoothed Empirical Likelihood Inference for ROC Curves with Missing Data , 2012 .

[32]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[33]  Ian R White,et al.  Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods , 2012, BMC Medical Research Methodology.

[34]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[35]  Hude Quan,et al.  Bmc Medical Research Methodology Open Access Dealing with Missing Data in a Multi-question Depression Scale: a Comparison of Imputation Methods , 2022 .

[36]  U. Egle,et al.  Der Kindheitsfragebogen, ein Instrument zur Beschreibung der erlebten Kindheitsbeziehung zu den Eltern , 2003 .