Extracting grey relational systems from incomplete road traffic accidents data: the case of Gauteng Province in South Africa

Motivation Road traffic accidents are among the top leading causes of deaths and injuries of various levels in South Africa. With the wealth and huge amount of data generated from road traffic accidents, the issue of traffic accident prediction has become a central challenge in the field of transportation data analysis. Such accident prediction is designed to detect patterns involved in dangerous crashes and thus help decision making and planning before casualty and loss occur. Recently, numerous researchers have presented a wide range of prediction techniques. Most of these methods are based on statistical studies but usually fail to explain the insights of prediction results. This has led to the development and application of supervised learning algorithms (classifiers) in an attempt to provide more accurate accident prediction in terms of injury severity (fatal/serious/slight/property damage with no injury). Even then, the task of learning an accurate classifier from instances raises a number of new issues some of which have not been properly addressed by transportation research. Thus, an effective prediction method is required for improving predictive accuracy. RESULTS The essence of the paper is the proposal that prediction of accidents given poor data quality (in terms of incomplete data) can be improved by using a classifier based on grey relational analysis, a similarity-based method. We evaluate the grey relational classifier with other state-of-the-art classifiers including artificial neural networks, classification and regression trees, k-nearest neighbour, linear discriminant analysis, naive Bayes classifier, algorithm quasi-optimal and support vector machines. Real-world road traffic accident dataset is utilized for this task. Experimental results are provided to illustrate the efficiency and the robustness of the grey relational classifier algorithm in terms of road traffic accident predictive accuracy. 2013 Wiley Publishing Ltd. Language: en

[1]  Bhekisipho Twala,et al.  Multiple classifier application to credit risk assessment , 2010, Expert Syst. Appl..

[2]  Chi-Chun Huang,et al.  A novel gray-based reduced NN classification method , 2006, Pattern Recognit..

[3]  J. L Lin,et al.  The use of the orthogonal array with grey relational analysis to optimize the electrical discharge machining process with multiple performance characteristics , 2002 .

[4]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[5]  Fred L. Mannering,et al.  An exploratory multinomial logit analysis of single-vehicle motorcycle accident severity , 1996 .

[6]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[7]  Bhekisipho Twala,et al.  Predicting incomplete gene microarray data with the use of supervised learning algorithms , 2010, Pattern Recognit. Lett..

[8]  Bhekisipho Twala,et al.  AN EMPIRICAL COMPARISON OF TECHNIQUES FOR HANDLING INCOMPLETE DATA USING DECISION TREES , 2009, Appl. Artif. Intell..

[9]  Geoffrey J. McLachlan,et al.  Discriminant Analysis and Statistical Pattern Recognition: McLachlan/Discriminant Analysis & Pattern Recog , 2005 .

[10]  Hans C. Jessen,et al.  Applied Logistic Regression Analysis , 1996 .

[11]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[12]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.

[13]  S Oppe,et al.  A comparison of some statistical techniques for road accident analysis. , 1992, Accident; analysis and prevention.

[14]  Heung Wong,et al.  Change-point analysis of hydrological time series using grey relational method , 2006 .

[15]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[16]  S Y Sohn,et al.  Pattern recognition for road traffic accident severity in Korea , 2001, Ergonomics.

[17]  Mohamed Abdel-Aty,et al.  Development of Artificial Neural Network Models to Predict Driver Injury Severity in Traffic Accidents at Signalized Intersections , 2001 .

[18]  H Lum,et al.  Modeling vehicle accidents and highway geometric design relationships. , 1993, Accident; analysis and prevention.

[19]  M. Mao,et al.  Application of grey model GM(1, 1) to vehicle fatality risk estimation , 2006 .

[20]  J. Deng,et al.  Introduction to Grey system theory , 1989 .

[21]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[22]  Johan A. K. Suykens,et al.  Handling missing values in support vector machine classifiers , 2005, Neural Networks.

[23]  William W. Cohen Learning Trees and Rules with Set-Valued Features , 1996, AAAI/IAAI, Vol. 1.

[24]  R. J. Smeed,et al.  The usefulness of formulae in traffic engineering and road safety , 1972 .

[25]  Qinbao Song,et al.  Using grey relational analysis to predict software effort with small data sets , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[26]  Ajith Abraham,et al.  Traffic Accident Analysis Using Decision Trees and Neural Networks , 2014 .

[27]  M. Pazzani,et al.  Error Reduction through Learning Multiple Descriptions , 1996, Machine Learning.

[28]  P. S. Kao,et al.  Optimization of electrochemical polishing of stainless steel by grey relational analysis , 2003 .

[29]  Ana Simonet,et al.  Dealing with Missing Values in a Probabilistic Decision Tree during Classification , 2006, ICDM Workshops.

[30]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[31]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[32]  Ivan Bratko,et al.  ASSISTANT 86: A Knowledge-Elicitation Tool for Sophisticated Users , 1987, EWSL.

[33]  Lukasz A. Kurgan,et al.  Impact of imputation of missing values on classification error for discrete data , 2008, Pattern Recognit..

[34]  Igor Kononenko,et al.  Semi-Naive Bayesian Classifier , 1991, EWSL.

[35]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[36]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[37]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[38]  Amit Gupta,et al.  Estimating Missing Values Using Neural Networks , 1996 .

[39]  L Li,et al.  Personal and behavioral predictors of automobile crash and injury severity. , 1995, Accident; analysis and prevention.

[40]  G. Guyatt,et al.  The independent contribution of driver, crash, and vehicle characteristics to driver fatalities. , 2002, Accident; analysis and prevention.

[41]  K. Chiang,et al.  Optimization of the WEDM process of particle-reinforced material with multiple performance characteristics using grey relational analysis , 2006 .

[42]  Susan A Ferguson,et al.  Trends in fatal crashes involving female drivers, 1975-1998. , 2003, Accident; analysis and prevention.

[43]  L Mussone,et al.  An analysis of urban collisions using an artificial intelligence model. , 1999, Accident; analysis and prevention.

[44]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[45]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[46]  Michel Verleysen,et al.  K nearest neighbours with mutual information for simultaneous classification and missing data imputation , 2009, Neurocomputing.

[47]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[48]  William T. Scherer,et al.  Exploring Imputation Techniques for Missing Data in Transportation Management Systems , 2003 .

[49]  David J. Hand,et al.  Good methods for coping with missing data in decision trees , 2008, Pattern Recognit. Lett..

[50]  Young-Jun Kweon,et al.  Overall injury risk to different drivers: combining exposure, frequency, and severity models. , 2003, Accident; analysis and prevention.

[51]  Foster J. Provost,et al.  Handling Missing Values when Applying Classification Models , 2007, J. Mach. Learn. Res..

[52]  Lawrence Carin,et al.  On Classification with Incomplete Data , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Antonie Stam,et al.  FOUR APPROACHES TO THE CLASSIFICATION PROBLEM IN DISCRIMINANT ANALYSIS: AN EXPERIMENTAL STUDY* , 1988 .

[54]  J W Roh,et al.  Traffic fatalities, Peltzman's model, and directed graphs. , 1999, Accident; analysis and prevention.

[55]  Fu Xiao,et al.  A diagnostic tool for online sensor health monitoring in air-conditioning systems , 2006 .

[56]  Pat Langley,et al.  Induction of Recursive Bayesian Classifiers , 1993, ECML.

[57]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .