Combining probabilistic neural networks and decision trees for maximally accurate and efficient accident prediction

The extent to which accident severity can be predicted from accident-related data collected at a variety of locations is investigated. The 2005 accident dataset brought together by the Republic of Cyprus Police is employed; this dataset comprises 1407 records of 43 continuous and categorical input parameters and a single categorical output parameter representing accident severity. No transformation of the database has been opted for, either by extracting the parameters that are significant for the prediction task or by modifying the records in any way (e.g. via record selection or transformation). Aiming at maximally accurate and efficient prediction, a combination of probabilistic neural networks (PNN's) and decision trees (DT's) is implemented: the simple training and direct operation of the PNN is complemented by the hierarchical, exhaustive and recursive construction of the DT. By training pairs of PNN's on data from the partitions derived from the minimal necessary number of top DT nodes, both efficiency and accident prediction accuracy are maximized.

[1]  S Y Sohn,et al.  Pattern recognition for road traffic accident severity in Korea , 2001, Ergonomics.

[2]  D. F. Specht,et al.  Probabilistic neural networks for classification, mapping, or associative memory , 1988, IEEE 1988 International Conference on Neural Networks.

[3]  Michael J. Pazzani,et al.  Reducing Misclassification Costs , 1994, ICML.

[4]  Frederick W. Williams,et al.  The EX-SHADWELL-Full Scale Fire Research and Test Ship , 1987 .

[5]  Tatiana Tambouratzis Counter-clustering for Training Pattern Selection , 2000, Comput. J..

[6]  Yu-Shan Shih Selecting the best categorical split for classification trees , 2001 .

[7]  Dominique Bertrand,et al.  Reduction of the size of the learning data in a probabilistic neural network by hierarchical clustering. Application to the discrimination of seeds by artificial vision , 1996 .

[8]  Yu-Shan Shih Selecting the best splits for classification trees with categorical variables , 2001 .

[9]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[10]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[11]  W. Loh,et al.  Tree-Structured Classification via Generalized Discriminant Analysis. , 1988 .

[12]  Ajith Abraham,et al.  Traffic Accident Analysis Using Machine Learning Paradigms , 2005, Informatica.

[13]  Panos Louvieris,et al.  Human-Centered Safety Analysis of Prospective Road Designs , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[14]  Yuanchang Xie,et al.  Predicting motor vehicle collisions using Bayesian neural network models: an empirical analysis. , 2007, Accident; analysis and prevention.

[15]  Kyuseok Shim,et al.  PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning , 1998, Data Mining and Knowledge Discovery.

[16]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[17]  Wray L. Buntine,et al.  A Further Comparison of Splitting Rules for Decision-Tree Induction , 1992, Machine Learning.

[18]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[19]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[20]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[21]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[22]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[23]  Donald F. Specht,et al.  Probabilistic neural networks and the polynomial Adaline as complementary techniques for classification , 1990, IEEE Trans. Neural Networks.

[24]  So Young Sohn,et al.  Data fusion, ensemble and clustering to improve the classification accuracy for the severity of road traffic accidents in Korea , 2003 .

[25]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[26]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[27]  Imre Pázsit,et al.  General regression artificial neural networks for two-phase flow regime identification , 2009, 2009 International Joint Conference on Neural Networks.

[28]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[29]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[30]  Susan L. Rose-Pehrsson,et al.  Training set optimization methods for a probabilistic neural network , 2004 .

[31]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .