How to determine an optimal threshold to classify real-time crash-prone traffic conditions?

One of the proactive approaches in reducing traffic crashes is to identify hazardous traffic conditions that may lead to a traffic crash, known as real-time crash prediction. Threshold selection is one of the essential steps of real-time crash prediction. And it provides the cut-off point for the posterior probability which is used to separate potential crash warnings against normal traffic conditions, after the outcome of the probability of a crash occurring given a specific traffic condition on the basis of crash risk evaluation models. There is however a dearth of research that focuses on how to effectively determine an optimal threshold. And only when discussing the predictive performance of the models, a few studies utilized subjective methods to choose the threshold. The subjective methods cannot automatically identify the optimal thresholds in different traffic and weather conditions in real application. Thus, a theoretical method to select the threshold value is necessary for the sake of avoiding subjective judgments. The purpose of this study is to provide a theoretical method for automatically identifying the optimal threshold. Considering the random effects of variable factors across all roadway segments, the mixed logit model was utilized to develop the crash risk evaluation model and further evaluate the crash risk. Cross-entropy, between-class variance and other theories were employed and investigated to empirically identify the optimal threshold. And K-fold cross-validation was used to validate the performance of proposed threshold selection methods with the help of several evaluation criteria. The results indicate that (i) the mixed logit model can obtain a good performance; (ii) the classification performance of the threshold selected by the minimum cross-entropy method outperforms the other methods according to the criteria. This method can be well-behaved to automatically identify thresholds in crash prediction, by minimizing the cross entropy between the original dataset with continuous probability of a crash occurring and the binarized dataset after using the thresholds to separate potential crash warnings against normal traffic conditions.

[1]  Mohamed Abdel-Aty,et al.  Assessment of freeway traffic parameters leading to lane-change related collisions. , 2006, Accident; analysis and prevention.

[2]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[3]  Ricardo Alvarez-Daziano,et al.  DRAFT MIXED LOGIT VS. NESTED LOGIT AND PROBIT MODELS , 2001 .

[4]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[5]  Peng-Yeng Yin,et al.  Multilevel minimum cross entropy threshold selection based on particle swarm optimization , 2007, Appl. Math. Comput..

[6]  Wei Wang,et al.  Evaluation of the impacts of traffic states on crash risks on freeways. , 2012, Accident; analysis and prevention.

[7]  Weixu Wang,et al.  Using the Bayesian updating approach to improve the spatial and temporal transferability of real-time crash risk prediction models , 2014 .

[8]  José Antonio Lozano,et al.  Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Mohamed Abdel-Aty,et al.  Real-Time Crash Risk Estimation: Are All Freeways Created Equal? , 2011 .

[10]  Karin Ayumi Tamura,et al.  New prediction method for the mixed logistic model applied in a marketing problem , 2013, Comput. Stat. Data Anal..

[11]  Matthieu Lerasle,et al.  Choice of V for V-Fold Cross-Validation in Least-Squares Density Estimation , 2012, J. Mach. Learn. Res..

[12]  Wei Wang,et al.  Calibration of crash risk models on freeways with limited real-time traffic data using Bayesian meta-analysis and Bayesian inference approach. , 2015, Accident; analysis and prevention.

[13]  Andrew K. C. Wong,et al.  A new method for gray-level picture thresholding using the entropy of the histogram , 1985, Comput. Vis. Graph. Image Process..

[14]  Mohamed Abdel-Aty,et al.  Investigating the different characteristics of weekday and weekend crashes. , 2013, Journal of safety research.

[15]  Samir Trabelsi,et al.  A comparison of Bayesian, Hazard, and Mixed Logit model of bankruptcy prediction , 2015, Comput. Manag. Sci..

[16]  Mohamed Abdel-Aty,et al.  Analyzing crash injury severity for a mountainous freeway incorporating real-time traffic and weather data , 2014 .

[17]  Giovanni Seni,et al.  Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions , 2010, Ensemble Methods in Data Mining.

[18]  Mohamed M. Ahmed,et al.  The Viability of Using Automatic Vehicle Identification Data for Real-Time Crash Prediction , 2012, IEEE Transactions on Intelligent Transportation Systems.

[19]  Chun-hung Li,et al.  Minimum cross entropy thresholding , 1993, Pattern Recognit..

[20]  Thierry Pun,et al.  A new method for grey-level picture thresholding using the entropy of the histogram , 1980 .

[21]  Azriel Rosenfeld,et al.  A Threshold Selection Technique , 1974, IEEE Transactions on Computers.

[22]  Mohamed Abdel-Aty,et al.  The Concept of Proactive Traffic Management for Enhancing Freeway Safety and Operation , 2010 .

[23]  Mohamed Abdel-Aty,et al.  Split Models for Predicting Multivehicle Crashes during High-Speed and Low-Speed Operating Conditions on Freeways , 2005 .

[24]  Ruggiero Lovreglio,et al.  A mixed logit model for predicting exit choice during building evacuations , 2016 .

[25]  Mohamed Abdel-Aty,et al.  Comprehensive Analysis of the Relationship between Real-Time Traffic Surveillance Data and Rear-End Crashes on Freeways , 2006 .

[26]  Kui Yang,et al.  Crash risk analysis for Shanghai urban expressways: A Bayesian semi-parametric modeling approach. , 2016, Accident; analysis and prevention.

[27]  Worthie Doyle,et al.  Operations Useful for Similarity-Invariant Pattern Recognition , 1962, JACM.

[28]  H. Cramér Mathematical Methods of Statistics (PMS-9), Volume 9 , 1946 .

[29]  Mohamed Abdel-Aty,et al.  Identifying crash propensity using specific traffic speed conditions. , 2005, Journal of safety research.

[30]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[31]  D. Hensher,et al.  Predicting Firm Financial Distress: A Mixed Logit Model , 2004 .

[32]  Gwo-Hshiung Tzeng,et al.  Identification of a threshold value for the DEMATEL method using the maximum mean de-entropy algorithm to find critical services provided by a semiconductor intellectual property mall , 2009, Expert Syst. Appl..

[33]  Mohamed Abdel-Aty,et al.  Utilizing support vector machine in real-time crash risk evaluation. , 2013, Accident; analysis and prevention.

[34]  Akira Asano,et al.  Hybrid Image Thresholding Method using Edge Detection , 2009 .

[35]  Wei Wang,et al.  A Genetic Programming Model for Real-Time Crash Prediction on Freeways , 2013, IEEE Transactions on Intelligent Transportation Systems.

[36]  Mohamed Abdel-Aty,et al.  Evaluation of variable speed limits for real-time freeway safety improvement. , 2006, Accident; analysis and prevention.

[37]  Wei Wang,et al.  Identifying crash-prone traffic conditions under different weather on freeways. , 2013, Journal of safety research.

[38]  Mohamed Abdel-Aty,et al.  A data fusion framework for real-time risk assessment on freeways , 2013 .

[39]  Nabih N. Abdelmalek,et al.  Maximum likelihood thresholding based on population mixture models , 1992, Pattern Recognit..

[40]  Yu-Jin Zhang,et al.  Half Century for Image Segmentation , 2015 .

[41]  R. Kayalvizhi,et al.  Modified bacterial foraging algorithm based multilevel thresholding for image segmentation , 2011, Eng. Appl. Artif. Intell..

[42]  David A. Hensher,et al.  The Mixed Logit Model: the State of Practice and Warnings for the Unwary , 2001 .