Big vehicular traffic Data mining: Towards accident and congestion prevention

In 2013, 32,719 people died in traffic crashes in the USA. Almost 90 people on average lose their lives every day and more than 250 are injured every hour. Road safety could be enhanced by decreasing the traffic crashes. Traffic crashes cause traffic congestion as well, which has become unbearable, especially in mega-cities In addition, direct and indirect loss from traffic congestion only is over $124 billion. The existence of the Big Data of traffic crashes, as well as the availability of Big Data analytics tools can help us gain useful insights to enhance road safety and decrease traffic crashes. In this paper we use H2O and WEKA mining tools. We apply the feature selection techniques to find the most important predictors. In addition, we tackle the problem of class imbalance by employing bagging and using different quality measures. Furthermore, we evaluate the performance of five classifiers to: (1) Conduct Big Data analysis on a big traffic accidents dataset of 146322 examples, find useful insight and patterns from the data, and forecast possible accidents in advance (2) Conduct Big Data analysis on a big vehicular casualties dataset of 194477 examples, to study the driver's behavior on the road. From the driver's behavior mining we can predict the driver age, sex as well as the accident severity. The aforementioned analyses, can be used by decision makers and practitioners to develop new traffic rules and policies, in order to prevent accidents, and increase roadway safety.

[1]  M. Hemalatha,et al.  A Perspective Analysis of Traffic Accident using Data Mining Techniques , 2011 .

[2]  Xingquan Zhu,et al.  iSRD: Spam review detection with imbalanced data distributions , 2014, Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014).

[3]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[4]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[5]  Behrouz Homayoun Far,et al.  Data-oriented intelligent transportation systems , 2014, Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014).

[6]  Christian Bonnet,et al.  Mobility models for vehicular ad hoc networks: a survey and taxonomy , 2009, IEEE Communications Surveys & Tutorials.

[7]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[8]  Alain L. Kornhauser,et al.  The Effect of Augmented Driver Behavior on Freeway Traffic Flow , 2012 .

[9]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  Miao M. Chong,et al.  Traffic Accident Data Mining Using Machine Learning Paradigms , 2004 .

[12]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[13]  Michael J. Markowski,et al.  MODELING BEHAVIOR IN VEHICULAR AND PEDESTRIAN TRAFFIC FLOW , 2008 .

[14]  Alicia Troncoso Lora,et al.  Data Mining for Predicting Traffic Congestion and Its Application to Spanish Data , 2015, SOCO.

[15]  Qinghai Miao,et al.  Effects of Driver Behavior on Traffic Flow at Three-lane Roundabouts , 2005 .

[16]  Dan Wang,et al.  An Effective Feature Selection Approach for Network Intrusion Detection , 2013, 2013 IEEE Eighth International Conference on Networking, Architecture and Storage.

[17]  Huan Liu,et al.  Mining Human Mobility in Location-Based Social Networks , 2015, Mining Human Mobility in Location-Based Social Networks.