Predicting Future Driving Risk of Crash-Involved Drivers Based on a Systematic Machine Learning Framework

The objective of this paper is to predict the future driving risk of crash-involved drivers in Kunshan, China. A systematic machine learning framework is proposed to deal with three critical technical issues: 1. defining driving risk; 2. developing risky driving factors; 3. developing a reliable and explicable machine learning model. High-risk (HR) and low-risk (LR) drivers were defined by five different scenarios. A number of features were extracted from seven-year crash/violation records. Drivers’ two-year prior crash/violation information was used to predict their driving risk in the subsequent two years. Using a one-year rolling time window, prediction models were developed for four consecutive time periods: 2013–2014, 2014–2015, 2015–2016, and 2016–2017. Four tree-based ensemble learning techniques were attempted, including random forest (RF), Adaboost with decision tree, gradient boosting decision tree (GBDT), and extreme gradient boosting decision tree (XGboost). A temporal transferability test and a follow-up study were applied to validate the trained models. The best scenario defining driving risk was multi-dimensional, encompassing crash recurrence, severity, and fault commitment. GBDT appeared to be the best model choice across all time periods, with an acceptable average precision (AP) of 0.68 on the most recent datasets (i.e., 2016–2017). Seven of nine top features were related to risky driving behaviors, which presented non-linear relationships with driving risk. Model transferability held within relatively short time intervals (1–2 years). Appropriate risk definition, complicated violation/crash features, and advanced machine learning techniques need to be considered for risk prediction task. The proposed machine learning approach is promising, so that safety interventions can be launched more effectively.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Xun Zhang,et al.  Analyzing fault and severity in pedestrian-motor vehicle accidents in China. , 2014, Accident; analysis and prevention.

[3]  Feng Chen,et al.  Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. , 2018, Journal of safety research.

[4]  Guangnan Zhang,et al.  Risk factors associated with traffic violations and accident severity in China. , 2013, Accident; analysis and prevention.

[5]  Randall Guensler,et al.  Relationships between Crash Involvement and Temporal-Spatial Driving Behavior Activity Patterns , 2007 .

[6]  Mehdi Jabbari Nooghabi,et al.  Validation of the influencing factors associated with traffic violations and crashes on freeways of developing countries: A case study of Iran. , 2018, Accident; analysis and prevention.

[7]  Yasushi Nishida Analyzing accidents and developing elderly driver-targeted measures based on accident and violation records , 2015 .

[8]  Xiaoduan Sun,et al.  Estimating likelihood of future crashes for crash-prone drivers , 2015 .

[9]  Suren Chen,et al.  Crash Frequency Modeling Using Real-Time Environmental and Traffic Data and Unbalanced Panel Data Models , 2016, International journal of environmental research and public health.

[10]  Chengcheng Xu,et al.  Modeling faults among e-bike-related fatal crashes in China , 2017, Traffic injury prevention.

[11]  Abolfazl Mohammadzadeh Moghaddam,et al.  Introducing a risk estimation index for drivers: A case of Iran , 2014 .

[12]  Srinivas S Pulugurtha,et al.  Methods to rank traffic rule violations resulting in crashes for allocation of funds. , 2017, Accident; analysis and prevention.

[13]  Muhammad Zahid,et al.  Predicting Risky and Aggressive Driving Behavior among Taxi Drivers: Do Spatio-Temporal Attributes Matter? , 2020, International journal of environmental research and public health.

[14]  Yao Danya,et al.  Driving behavior differences between crash-involved and crash-not-involved drivers using urban traffic surveillance data , 2016, 2016 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI).

[15]  Xingda Qu,et al.  The role of personality traits and driving experience in self-reported risky driving behaviors and accident risk among Chinese drivers. , 2017, Accident; analysis and prevention.

[16]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[17]  Michael A Gebers STRATEGIES FOR ESTIMATING DRIVER ACCIDENT RISK IN RELATION TO CALIFORNIA'S NEGLIGENT-OPERATOR POINT SYSTEM , 1999 .

[18]  M. Greenwood,et al.  An Inquiry into the Nature of Frequency Distributions Representative of Multiple Happenings with Particular Reference to the Occurrence of Multiple Attacks of Disease or of Repeated Accidents , 1920 .

[19]  Ronald S Coppin,et al.  THE DISTRIBUTION AND PREDICTION OF DRIVER ACCIDENT FREQUENCIES , 1971 .

[20]  Pierre Joly,et al.  Previous convictions or accidents and the risk of subsequent accidents of older drivers. , 2002, Accident; analysis and prevention.

[21]  Gregory W. Corder,et al.  Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach , 2009 .

[22]  Nikiforos Stamatiadis,et al.  Evaluation Of Retesting in Kentucky's Driver License Process , 1999 .

[23]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[24]  Ricardo D. Blasco,et al.  Accident probability after accident occurrence , 2003 .

[25]  Changxi Ma,et al.  The Impact of Aggressive Driving Behavior on Driver-Injury Severity at Highway-Rail Grade Crossings Accidents , 2018, Journal of Advanced Transportation.

[26]  Chengcheng Xu,et al.  Association rule analysis of factors contributing to extraordinarily severe traffic crashes in China. , 2018, Journal of safety research.

[27]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[28]  Nikiforos Stamatiadis,et al.  Crash involvement of drivers with multiple crashes. , 2006, Accident; analysis and prevention.

[29]  Sergio A. Useche,et al.  Stress-related psychosocial factors at work, fatigue, and risky driving behavior in bus rapid transport (BRT) drivers. , 2017, Accident; analysis and prevention.

[30]  Keli A Braitman,et al.  Effects of Age and Experience on Young Driver Crashes: Review of Recent Literature , 2009, Traffic injury prevention.

[31]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[32]  Randall Guensler,et al.  Differences in observed speed patterns between crash-involved and crash-not-involved drivers: Application of in-vehicle monitoring technology , 2011 .

[33]  Qiang Meng,et al.  Remote park-and-ride network equilibrium model and its applications , 2018 .

[34]  Sukhvir S Brar Estimating the over-involvement of suspended, revoked, and unlicensed drivers as at-fault drivers in California fatal crashes. , 2014, Journal of safety research.

[35]  Dae-Hwan Kim,et al.  Prediction of vehicle crashes by drivers' characteristics and past traffic violations in Korea using a zero-inflated negative binomial model , 2016, Traffic injury prevention.

[36]  Changxi Ma,et al.  Developing a Coordinated Signal Control System for Urban Ring Road Under the Vehicle-Infrastructure Connected Environment , 2018, IEEE Access.

[37]  Patricia Delhomme,et al.  Evaluating individual risk proneness with vehicle dynamics and self-report data - toward the efficient detection of At-risk drivers. , 2019, Accident; analysis and prevention.

[38]  Raymond C Peck,et al.  Using traffic conviction correlates to identify high accident-risk drivers. , 2003, Accident; analysis and prevention.

[39]  Lisa Buckley,et al.  Adolescent involvement in anti-social and delinquent behaviours: predicting future injury risk. , 2012, Accident; analysis and prevention.

[40]  Zhiyuan Liu,et al.  Willingness to board: A novel concept for modeling queuing up passengers , 2016 .