Analysis of Fatal Truck-Involved Work Zone Crashes in Florida: Application of Tree-Based Models

This paper presents the results of an analysis focusing on large truck-involved work zone fatal crashes using seven-year crash data in the State of Florida. Decision tree/random forest models were applied to specifically detect critical crash patterns that result in a fatality outcome. Because of the imbalanced nature of crash severity data (very low frequency of fatal crashes compared with property damage only or injury), data were treated using random and systematic over-sampling techniques. Marginal effects were addressed using Shapley values to increase model explainability. From a methodological perspective, results showed that the combination of over-sampling techniques with ensemble random forests could significantly improve model performance in predicting fatal crashes (compared with conventional logistic regression models). Primary contributors included pedestrian involvement, lighting conditions, safety equipment, driver condition, driver age, and work zone locations. For pedestrian crashes, factors such as dark-not lighted conditions, distracted truck driver, and driver’s age (young drivers outside city limits, senior drivers inside city limits) were highly likely to be fatal. For non-pedestrian crashes, the combination of front airbag deployment with any restraint system other than shoulder and belt was quite likely to be fatal. Also, abnormal driver conditions increased the risk of a fatal outcome. Additionally, the presence of female drivers (as the second driver in multiple vehicle crashes) highly decreased crash severity, probably because females typically drive more carefully than males. Interestingly, truck driver actions and maneuvers as well as roadway design and other physical environment features (i.e., number of lanes, median type, roadway grade, and alignment) did not show significant contribution to the model.

[1]  Mohammad Kermanshah,et al.  A Nested Logit analysis of the influence of distraction on types of vehicle crashes , 2018 .

[2]  Jie Gu,et al.  Making Class Bias Useful: A Strategy of Learning from Imbalanced Data , 2007, IDEAL.

[3]  A. Khattak,et al.  Injury Severity and Total Harm in Truck-Involved Work Zone Crashes , 2004 .

[4]  Steven D Schrock,et al.  AN ANALYSIS OF FATAL WORK ZONE CRASHES IN TEXAS , 2004 .

[5]  A. Khattak,et al.  Effects of work zone presence on injury and non-injury crashes. , 2002, Accident; analysis and prevention.

[6]  Mohamed Ahmed,et al.  A Probit-Decision Tree Approach to Analyze Effects of Adverse Weather Conditions on Work Zone Crash Severity Using Second Strategic Highway Research Program Roadway Information Dataset , 2017 .

[7]  Joseph L. Schofer,et al.  Enhanced Crash Reporting to Explore Workzone Crash Patterns , 2001 .

[8]  A. Çelik,et al.  A multinomial logit analysis of risk factors influencing road traffic injury severities in the Erzurum and Kars Provinces of Turkey. , 2014, Accident; analysis and prevention.

[9]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[10]  Srinivas Reddy Geedipally,et al.  Analysis of crash severities using nested logit model--accounting for the underreporting of crashes. , 2012, Accident; analysis and prevention.

[11]  Qiang Meng,et al.  Analysis of driver casualty risk for different work zone types. , 2011, Accident; analysis and prevention.

[12]  Mohamed Abdel-Aty,et al.  Freeway Work-Zone Crash Analysis and Risk Identification Using Multiple and Conditional Logistic Regression , 2008 .

[13]  L. S. Shapley,et al.  17. A Value for n-Person Games , 1953 .

[14]  Zong Tian,et al.  Investigating driver injury severity patterns in rollover crashes using support vector machine models. , 2016, Accident; analysis and prevention.

[15]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[16]  Hatem A. Fayed,et al.  Speed up grid-search for parameter selection of support vector machines , 2019, Appl. Soft Comput..

[17]  Mehrnaz Doustmohammadi,et al.  Multinomial Logit Analysis of Injury Severity in Crashes Involving Emotional Drivers , 2019 .

[18]  David A. Hensher,et al.  The Mixed Logit Model: the State of Practice and Warnings for the Unwary , 2001 .

[19]  Khaled Ksaibati,et al.  Ordered logistic models of influencing factors on crash injury severity of single and multiple-vehicle downgrade crashes: A case study in Wyoming. , 2019, Journal of safety research.

[20]  Arash Jahangiri,et al.  Crash severity analysis of rear-end crashes in California using statistical and machine learning classification methods , 2018, Journal of Transportation Safety & Security.

[21]  Avinash Unnikrishnan,et al.  Analysis of large truck crash severity using heteroskedastic ordered probit models. , 2011, Accident; analysis and prevention.

[22]  Hesham A. Rakha,et al.  Applying Machine Learning Techniques to Transportation Mode Recognition Using Mobile Phone Sensor Data , 2015, IEEE Transactions on Intelligent Transportation Systems.

[23]  Hong Qiao,et al.  Comparing data mining methods with logistic regression in childhood obesity prediction , 2009, Inf. Syst. Frontiers.

[24]  Carla E. Brodley,et al.  Pruning Decision Trees with Misclassification Costs , 1998, ECML.

[25]  Juan de Oña,et al.  Analysis of traffic accident severity using Decision Rules via Decision Trees , 2013, Expert Syst. Appl..

[26]  Bernd Bischl,et al.  Tunability: Importance of Hyperparameters of Machine Learning Algorithms , 2018, J. Mach. Learn. Res..

[27]  Nathan Huynh,et al.  Analysis of driver injury severity in rural single-vehicle crashes. , 2012, Accident; analysis and prevention.

[28]  Mani Golparvar-Fard,et al.  Evaluation of Multiclass Traffic Sign Detection and Classification Methods for U.S. Roadway Asset Inventory Management , 2016, J. Comput. Civ. Eng..

[29]  Monica Menendez,et al.  Introducing a Re-Sampling Methodology for the Estimation of Empirical Macroscopic Fundamental Diagrams , 2017, Transportation Research Record: Journal of the Transportation Research Board.

[30]  Fred L Mannering,et al.  Highway accident severities and the mixed logit model: an exploratory empirical analysis. , 2008, Accident; analysis and prevention.

[31]  Rosa Maria Valdovinos,et al.  The Imbalanced Training Sample Problem: Under or over Sampling? , 2004, SSPR/SPR.

[32]  Qiong Wu,et al.  Mixed logit model-based driver injury severity investigations in single- and multi-vehicle crashes on rural two-lane highways. , 2014, Accident; analysis and prevention.

[33]  F Mannering,et al.  Analysis of injury severity and vehicle occupancy in truck- and non-truck-involved accidents. , 1999, Accident; analysis and prevention.

[34]  Juan de Oña,et al.  Injury severity models for motor vehicle accidents: a review , 2013 .

[35]  A. Elhassan,et al.  Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method , 2017 .

[36]  William Hsu,et al.  COPING WITH CLASS IMBALANCE IN CLASSIFICATION OF TRAFFIC CRASH SEVERITY BASED ON SENSOR AND ROAD DATA: A FEATURE SELECTION AND DATA AUGMENTATION APPROACH , 2019, Computer Science & Information Technology (CS & IT ).

[37]  Raghavan Srinivasan,et al.  Analysis of the Frequency and Severity of Rear-End Crashes in Work Zones , 2013, Traffic injury prevention.

[38]  L. Shapley A Value for n-person Games , 1988 .

[39]  Arash Jahangiri,et al.  Red-light running violation prediction using observational and simulator data. , 2016, Accident; analysis and prevention.

[40]  Sreekanth Reddy Akepati,et al.  Characteristics and Contributory Factors of Work Zone Crashes , 2011 .

[41]  Xiaoyu Zhu,et al.  Modeling occupant-level injury severity: An application to large-truck crashes. , 2011, Accident; analysis and prevention.

[42]  Hugh Chen,et al.  From local explanations to global understanding with explainable AI for trees , 2020, Nature Machine Intelligence.

[43]  Mohamed Abdel-Aty,et al.  Utilizing support vector machine in real-time crash risk evaluation. , 2013, Accident; analysis and prevention.

[44]  K. S. Joseph,et al.  Comparison of logistic regression with machine learning methods for the prediction of fetal growth abnormalities: a retrospective cohort study , 2018, BMC Pregnancy and Childbirth.

[45]  Francisco Javier García Castellano,et al.  Decision Tree Ensemble Method for Analyzing Traffic Accidents of Novice Drivers in Urban Areas , 2019, Entropy.

[46]  F. Mannering,et al.  Unobserved heterogeneity and temporal instability in the analysis of work-zone crash-injury severities , 2020, Analytic Methods in Accident Research.

[47]  Brenda Lantz,et al.  Commercial truck crash injury severity analysis using gradient boosting data mining model. , 2018, Journal of safety research.

[48]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[49]  Yingfeng Li,et al.  Comparison of characteristics between fatal and injury accidents in the highway construction zones , 2008 .

[50]  Suren Chen,et al.  Injury severities of truck drivers in single- and multi-vehicle accidents on rural highways. , 2011, Accident; analysis and prevention.

[51]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[52]  Qiang Meng,et al.  Tree‐Based Logistic Regression Approach for Work Zone Casualty Risk Assessment , 2013, Risk analysis : an official publication of the Society for Risk Analysis.

[53]  Shing Chung Josh Wong,et al.  Random parameter probit models to analyze pedestrian red-light violations and injury severity in pedestrian–motor vehicle crashes at signalized crossings , 2020, Journal of Transportation Safety & Security.

[54]  D. Lord,et al.  Investigation of Effects of Underreporting Crash Data on Three Commonly Used Traffic Crash Severity Models , 2011 .

[55]  N. Sze,et al.  Factors contributing to injury severity in work zone related crashes in New Zealand , 2019 .

[56]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[57]  Yen-Liang Chen,et al.  Using decision trees to summarize associative classification rules , 2009, Expert Syst. Appl..

[58]  Yingfeng Li,et al.  Highway Work Zone Risk Factors and Their Impact on Crash Severity , 2009 .

[59]  Priyanka Alluri,et al.  Identification of Secondary Crash Risk Factors using Penalized Logistic Regression Model , 2019, Transportation Research Record: Journal of the Transportation Research Board.

[60]  Raghavan Srinivasan,et al.  Use of Empirical Bayesian Methods to Estimate Crash Modification Factors for Daytime versus Nighttime Work Zones , 2011 .

[61]  A. Mensah,et al.  Crash severity modelling using ordinal logistic regression approach , 2020, International journal of injury control and safety promotion.

[62]  Mohamed Ahmed,et al.  Parametric Ordinal Logistic Regression and Non-Parametric Decision Tree Approaches for Assessing the Impact of Weather Conditions on Driver Speed Selection Using Naturalistic Driving Data , 2018 .

[63]  Haobin Jiang,et al.  A multinomial logit analysis of factors associated with severity of motorcycle crashes in Ghana , 2019, Traffic injury prevention.

[64]  Wei Wang,et al.  Using support vector machine models for crash injury severity analysis. , 2012, Accident; analysis and prevention.

[65]  Mohammad Jalayer,et al.  Ch. 9: Predicting Traffic Safety Risk Factors Using an Ensemble Classifier , 2018 .

[66]  Q. Zeng,et al.  Investigation of injury severity in urban expressway crashes: A case study from Beijing , 2020, PloS one.

[67]  Samantha Islam,et al.  Comprehensive analysis of single- and multi-vehicle large truck at-fault crashes on rural and urban roadways in Alabama. , 2014, Accident; analysis and prevention.

[68]  Kairan Zhang,et al.  Crash severity analysis of nighttime and daytime highway work zone crashes , 2019, PloS one.

[69]  Sybil Derrible,et al.  Real-time accident detection: Coping with imbalanced data. , 2019, Accident; analysis and prevention.

[70]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[71]  Laurence R. Rilett,et al.  Resampling Methods for Estimating Travel Time Uncertainty: Application of the Gap Bootstrap , 2018, Transportation Research Record: Journal of the Transportation Research Board.

[72]  Heng Wei,et al.  Spatial distribution and characteristics of accident crashes at work zones of interstate freeways in Ohio , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[73]  Sabyasachee Mishra,et al.  Analysis of injury severity of large truck crashes in work zones. , 2016, Accident; analysis and prevention.

[74]  Wenbo Fan,et al.  Enhancing Crash Injury Severity Prediction on Imbalanced Crash Data by Sampling Technique with Variable Selection , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[75]  Wei Fan,et al.  A multinomial logit model of pedestrian-vehicle crash severity in North Carolina , 2019, International Journal of Transportation Science and Technology.

[76]  Li-Yen Chang,et al.  Analysis of traffic injury severity: an application of non-parametric classification tree techniques. , 2006, Accident; analysis and prevention.

[77]  Mohamed Abdel-Aty,et al.  Modeling rear-end collisions including the role of driver's visibility and light truck vehicles using a nested logit structure. , 2004, Accident; analysis and prevention.

[78]  John M. Rose,et al.  Applied Choice Analysis: A Primer , 2005 .

[79]  Nicholas Fiorentini,et al.  Handling Imbalanced Data in Road Crash Severity Prediction by Machine Learning Algorithms , 2020 .

[80]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[81]  Daniel S. Yeung,et al.  Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems , 2015, IEEE Transactions on Cybernetics.

[82]  Ajith Abraham,et al.  Traffic Accident Analysis Using Decision Trees and Neural Networks , 2014 .

[83]  Ryan Doczy,et al.  Machine Learning Methods to Analyze Injury Severity of Drivers from Different Age and Gender Groups , 2018, Transportation Research Record: Journal of the Transportation Research Board.

[84]  Mark Hickman,et al.  Analysis of the factors affecting the severity of two-vehicle crashes , 2008 .

[85]  Tomislav Fratrović,et al.  Analysis of factors influencing the vehicle damage level in fatal truck-related accidents and differences in rural and urban areas , 2016 .

[86]  Linjun Lu,et al.  Modeling Injury Severity in Work Zones Using Ordered Probit Regression , 2010 .

[87]  Wei Wang,et al.  Construct support vector machine ensemble to detect traffic incident , 2009, Expert Syst. Appl..

[88]  Saleh R Mousa,et al.  A Comprehensive Railroad-Highway Grade Crossing Consolidation Model: A Machine Learning Approach. , 2019, Accident; analysis and prevention.

[89]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[90]  Hakmook Kang,et al.  Machine learning methods are comparable to logistic regression techniques in predicting severe walking limitation following total knee arthroplasty , 2019, Knee Surgery, Sports Traumatology, Arthroscopy.

[91]  Kaan Ozbay,et al.  Estimating the Impact of Work Zones on Highway Safety , 2014 .

[92]  Hamidreza Asgari,et al.  Severity analysis for large truck rollover crashes using a random parameter ordered logit model. , 2019, Accident; analysis and prevention.

[93]  Kaan Ozbay,et al.  Modeling work zone crash frequency by quantifying measurement errors in work zone length. , 2013, Accident; analysis and prevention.

[94]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[95]  P. N. Suganthan,et al.  An approach for classification of highly imbalanced data using weighting and undersampling , 2010, Amino Acids.

[96]  Griselda López,et al.  Extracting decision rules from police accident reports through decision trees. , 2013, Accident; analysis and prevention.

[97]  Der-Chiang Li,et al.  Learning from small datasets containing nominal attributes , 2018, Neurocomputing.

[98]  Afshin Shariat Mohaymany,et al.  Analysis of the traffic injury severity on two-lane, two-way rural roads based on classification tree models , 2011 .

[99]  Carl Kingsford,et al.  What are decision trees? , 2008, Nature Biotechnology.

[100]  Qiang Meng,et al.  Evaluation of rear-end crash risk at work zone using work zone traffic data. , 2011, Accident; analysis and prevention.