Multiple imputation for incomplete traffic accident data using chained equations

Missing value in traffic accident data prevents the discovery of the significant factors to reduce accident severity and even lead to an invalid conclusion. In previous studies, to handle this problem, researchers mainly tried to improve the methodologies to fit the incomplete data. In this paper, we propose a missing value imputation method. It can impute missing values in the traffic accident data set. The method is called multiple imputation by chained equations (MICE) which is flexible and practical. It can not only cope with univariate missing values but also multivariate missing values. The proposed algorithm is compared with two traditional imputation methods using two publicly available traffic accident datasets from New York. Furthermore, we test the performance of the model with different missing ratios. The imputations for continuous variables and discrete variables are analyzed separately. The results indicate that our proposed model outperforms the other two models under almost all situations.

[1]  Bhekisipho Twala,et al.  Extracting grey relational systems from incomplete road traffic accidents data: the case of Gauteng Province in South Africa , 2014, Expert Syst. J. Knowl. Eng..

[2]  Angshuman Guin,et al.  Multiple Imputation Scheme for Overcoming the Missing Values and Variability Issues in ITS Data , 2005 .

[3]  Trivellore E. Raghunathan,et al.  Missing Data Analysis in Practice , 2015 .

[4]  Liming Wang,et al.  Imputing Missing Land Use Data: A Multiple Imputation by Chained Equations (MICE) Approach Based on Recursive Partitioning , 2015 .

[5]  Tomislav Fratrović,et al.  Analysis of factors influencing the vehicle damage level in fatal truck-related accidents and differences in rural and urban areas , 2016 .

[6]  J. Marrero,et al.  Comparison of imputation methods for missing laboratory data in medicine , 2013, BMJ Open.

[7]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[8]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[9]  Dominique Lord,et al.  The statistical analysis of highway crash-injury severities: a review and assessment of methodological alternatives. , 2011, Accident; analysis and prevention.

[10]  Fred L. Mannering,et al.  The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives , 2010 .

[11]  Alan Wee-Chung Liew,et al.  Missing Value Imputation for the Analysis of Incomplete Traffic Accident Data , 2014, ICMLC.