Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values

The aim of this article is to compare five popular missing data handling methods: listwise deletion, mean substitution, regression imputation, stochastic imputation, and multiple imputation. Three missing data mechanisms are investigated: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). A Monte Carlo simulation is applied to simulate data and then logistic regression parameters are estimated. Our findings show that, among the five missing data handling methods, multiple imputation performs well on both MCAR and MAR. There is no evidence indicating that listwise deletion and multiple imputation produce biased parameters for MCAR. None of these techniques can handle MNAR. Finally, this article suggests maximum percent missing data and a sample size for listwise deletion and multiple imputation techniques. 

[1]  Jacob Cohen,et al.  QUANTITATIVE METHODS IN PSYCHOLOGY A Power Primer , 1992 .

[2]  Craig K. Enders,et al.  Missing Data in Educational Research: A Review of Reporting Practices and Suggestions for Improvement , 2004 .

[3]  A. Davey,et al.  Statistical Power Analysis with Missing Data: A Structural Equation Modeling Approach , 2009 .

[4]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[5]  Paul D. Allison,et al.  Handling Missing Data by Maximum Likelihood , 2012 .

[6]  Craig K. Enders,et al.  Applied Missing Data Analysis , 2010 .

[7]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[8]  D. Hensher,et al.  Stated Choice Methods: Analysis and Applications , 2000 .

[9]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[10]  Jehanzeb R. Cheema Regular Articles: Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research , 2014 .

[11]  Trivellore E Raghunathan,et al.  What do we do with missing data? Some options for analysis of incomplete data. , 2004, Annual review of public health.

[12]  H. Theil Introduction to econometrics , 1978 .

[13]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[14]  Craig K. Enders,et al.  An introduction to modern missing data analyses. , 2010, Journal of school psychology.

[15]  G. King,et al.  Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation , 2001, American Political Science Review.

[16]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[17]  Leland Wilkinson,et al.  Statistical Methods in Psychology Journals Guidelines and Explanations , 2005 .

[18]  J. Stock,et al.  Introduction to Econometrics (3 Rd Updated Edition) , 2014 .

[19]  Enola K. Proctor,et al.  Imputing Missing Data: A Comparison of Methods for Social Work Researchers , 2006 .

[20]  Patrick E. McKnight,et al.  Multivariate modeling of missing data within and across assessment waves. , 2000, Addiction.