Predictability and interpretability of hybrid link-level crash frequency models for urban arterials compared to cluster-based and general negative binomial regression models

ABSTRACT Machine learning (ML) techniques have higher prediction accuracy compared to conventional statistical methods for crash frequency modelling. However, their black-box nature limits the interpretability. The objective of this research is to combine both ML and statistical methods to develop hybrid link-level crash frequency models with high predictability and interpretability. For this purpose, M5′ model trees method (M5′) is introduced and applied to classify the crash data and then calibrate a model for each homogenous class. The data for 1134 and 345 randomly selected links on urban arterials in the city of Charlotte, North Carolina was used to develop and validate models, respectively. The outputs from the hybrid approach are compared with the outputs from cluster-based negative binomial regression (NBR) and general NBR models. Findings indicate that M5' has high predictability and is very reliable to interpret the role of different attributes on crash frequency compared to other developed models.

[1]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[2]  Xiao Qin,et al.  Variable Selection Issues in Tree-Based Regression Models , 2008 .

[3]  A. Etemad-Shahidi,et al.  COMPARISON BETWEEN M5 MODEL TREE AND NEURAL NETWORKS FOR PREDICTION OF SIGNIFICANT WAVE HEIGHT IN LAKE SUPERIOR , 2009 .

[4]  Qiang Meng,et al.  Tree‐Based Logistic Regression Approach for Work Zone Casualty Risk Assessment , 2013, Risk analysis : an official publication of the Society for Risk Analysis.

[5]  Ali Naderan,et al.  Aggregate crash prediction models: introducing crash generation concept. , 2010, Accident; analysis and prevention.

[6]  Fedel Frank Saccomanno,et al.  Collision Frequency Analysis Using Tree-Based Stratification , 2005 .

[7]  Yuanlin Huang,et al.  Accident Prediction Models and Applications for Unsignalized and Signalized Intersections , 1991 .

[8]  Wei Wang,et al.  Evaluation of the impacts of traffic states on crash risks on freeways. , 2012, Accident; analysis and prevention.

[9]  Yong Luo,et al.  The Traffic Safety Study Based on Cluster Analysis and Sampling Theory , 2008, 2008 International Conference on Intelligent Computation Technology and Automation (ICICTA).

[10]  Jianming Ma,et al.  Crash frequency and severity modeling using clustered data from Washington State , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[11]  Li-Yen Chang,et al.  Analysis of traffic injury severity: an application of non-parametric classification tree techniques. , 2006, Accident; analysis and prevention.

[12]  Tarek Sayed,et al.  Traffic accident modeling: some statistical issues , 2006 .

[13]  Dominique Lord,et al.  The statistical analysis of highway crash-injury severities: a review and assessment of methodological alternatives. , 2011, Accident; analysis and prevention.

[14]  Mario De Luca,et al.  Road Safety Management Using Bayesian and Cluster analysis , 2012 .

[15]  Ziad A. Sawalha Traffic accident modeling : statistical issues and safety applications , 2002 .

[16]  Yi-Shih Chung Factor Complexity of Accident Occurrence: An Empirical Demonstration Using Boosted Regression Trees , 2011 .

[17]  Iman Aghayan,et al.  EXTENDED TRAFFIC CRASH MODELLING THROUGH PRECISION AND RESPONSE TIME USING FUZZY CLUSTERING ALGORITHMS COMPARED WITH MULTI-LAYER PERCEPTRON , 2012 .

[18]  S Y Sohn,et al.  Pattern recognition for road traffic accident severity in Korea , 2001, Ergonomics.

[19]  Pravin K. Trivedi,et al.  Regression Analysis of Count Data , 1998 .

[20]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[21]  Kirolos Haleem,et al.  Clustering-Based Roadway Segment Division for the Identification of High-Crash Locations , 2013 .

[22]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[23]  Shlomo Bekhor,et al.  Exploring the potential of data mining techniques for the analysis of accident patterns , 2010 .

[24]  J R Stewart,et al.  Applications of Classification and Regression Tree Methods in Roadway Safety Studies , 1996 .

[25]  Amir Etemad-Shahidi,et al.  Predicting Longitudinal Dispersion Coefficient in Natural Streams Using M5′ Model Tree , 2012 .

[26]  Yongsheng Chen,et al.  RESEARCH ON SECTION DIVISION OF FREEWAY WITH ORDINAL CLUSTERING METHOD , 2007 .

[27]  Yajie Zou,et al.  Application of the Bayesian Model Averaging in Predicting Motor Vehicle Crashes , 2012 .

[28]  Ajith Abraham,et al.  Traffic Accident Analysis Using Machine Learning Paradigms , 2005, Informatica.

[29]  Geert Wets,et al.  Traffic accident segmentation by means of latent class clustering. , 2008, Accident; analysis and prevention.

[30]  Hoong Chor Chin,et al.  Identification of Accident Causal Factors and Prediction of Hazardousness of Intersection Approaches , 2003 .

[31]  So Young Sohn,et al.  Data fusion, ensemble and clustering to improve the classification accuracy for the severity of road traffic accidents in Korea , 2003 .

[32]  Richi Nayak,et al.  Using data mining to predict road crash count with a focus on skid resistance values , 2011 .

[33]  Eleni I. Vlahogianni,et al.  Statistical methods versus neural networks in transportation research: Differences, similarities and some insights , 2011 .

[34]  K. Do,et al.  Combining non-parametric models with logistic regression: an application to motor vehicle injury data , 2000 .

[35]  Shin-Ting Jeng,et al.  APPLYING DATA MINING TECHNIQUES FOR TRAFFIC INCIDENT ANALYSIS , 2004 .

[36]  Tarek Sayed,et al.  Accident prediction models with random corridor parameters. , 2009, Accident; analysis and prevention.

[37]  Yi-Shih Chung,et al.  Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees. , 2013, Accident; analysis and prevention.

[38]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[39]  Li-Yen Chang,et al.  Data mining of tree-based models to analyze freeway accident frequency. , 2005, Journal of safety research.

[40]  A. Abraham,et al.  RULE MINING AND CLASSIFICATION OF ROAD TRAFFIC ACCIDENTS USING ADAPTIVE REGRESSION TREES , 2005 .