Analysis of Factors Affecting the Severity of Automated Vehicle Crashes Using XGBoost Model Combining POI Data

The research and development of autonomous vehicle (AV) technology have been gaining ground globally. However, a few studies have performed an in-depth exploration of the contributing factors of crashes involving AVs. This study aims to predict the severity of crashes involving AVs and analyze the effects of the different factors on crash severity. Crash data were obtained from the AV-related crash reports presented to the California Department of Motor Vehicles in 2019 and included 75 uninjured and 18 injured accident cases. The points-of-interest (POI) data were collected from Google Map Application Programming Interface (API). Descriptive statistics analysis was applied to examine the features of crashes involving AVs in terms of collision type, crash severity, vehicle movement preceding the collision, and degree of vehicle damage. To compare the classification performance of different classifiers, we use two different classification models: eXtreme Gradient Boosting (XGBoost) and Classification and Regression Tree (CART). The result shows that the XGBoost model performs better in identifying the injured crashes involving AVs. Compared with the original XGBoost model, the recall and G-mean of the XGBoost model combining POI data improved by 100% and 11.1%, respectively. The main features that contribute to the severity of crashes include weather, degree of vehicle damage, accident location, and collision type. The results indicate that crash severity significantly increases if the AVs collided at an intersection under extreme weather conditions (e.g., fog and snow). Moreover, an accident resulting in injuries also had a higher probability of occurring in areas where land-use patterns are highly diverse. The knowledge gained from this research could ultimately contribute to assessing and improving the safety performance of the current AVs.

[1]  Kun Jiang,et al.  Intelligent and connected vehicles: Current status and future perspectives , 2018, Science China Technological Sciences.

[2]  Chen Wang,et al.  Statistical analysis of the patterns and characteristics of connected and autonomous vehicle involved crashes. , 2019, Journal of safety research.

[3]  Wenbo Zhang,et al.  Discovering the spatio-temporal impacts of built environment on metro ridership using smart card data , 2019 .

[4]  Inhi Kim,et al.  Traffic crash analysis with point-of-interest spatial clustering. , 2018, Accident; analysis and prevention.

[5]  Ilja Radusch,et al.  A rapid prototyping environment for cooperative Advanced Driver Assistance Systems , 2018 .

[6]  Pravin Varaiya,et al.  Making intersections safer with I2V communication , 2018, Transportation Research Part C: Emerging Technologies.

[7]  Yingjiu Pan,et al.  Investigating the impacts of built environment on traffic states incorporating spatial heterogeneity , 2020 .

[8]  Philipp Wintersberger,et al.  Effects of exhaust gases on laser scanner data quality at low ambient temperatures , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[9]  Junqing Tang,et al.  Assessing intercity multimodal choice behavior in a Touristy City: A factor analysis , 2020, Journal of Transport Geography.

[10]  Marin Marinov,et al.  A Study on Recent Developments and Issues with Obstacle Detection Systems for Automated Vehicles , 2020, Sustainability.

[11]  Zhixia Li,et al.  Exploring the mechanism of crashes with automated vehicles using statistical modeling approaches , 2019, PloS one.

[12]  Julian Nida-Rümelin,et al.  Responsibility for Crashes of Autonomous Vehicles: An Ethical Analysis , 2014, Science and Engineering Ethics.

[13]  Sherif Ishak,et al.  An extreme gradient boosting method for identifying the factors contributing to crash/near-crash events: a naturalistic driving study , 2019, Canadian Journal of Civil Engineering.

[14]  John Khoury,et al.  An Initial Investigation of the Effects of a Fully Automated Vehicle Fleet on Geometric Design , 2019, Journal of Advanced Transportation.

[15]  Alireza Talebpour,et al.  Influence of connected and autonomous vehicles on traffic flow stability and throughput , 2016 .

[16]  Matthias Schlögl,et al.  A comparison of statistical learning methods for deriving determining factors of accident occurrence from an imbalanced high resolution dataset. , 2019, Accident; analysis and prevention.

[17]  Peng Chen,et al.  Built environment factors in explaining the automobile-involved bicycle crash frequencies: a spatial statistic approach , 2015 .

[18]  Johan Engström,et al.  Toward Computational Simulations of Behavior During Automated Driving Takeovers: A Review of the Empirical and Modeling Literatures , 2019, Hum. Factors.

[19]  Han Yan,et al.  Predicting duration of traffic accidents based on cost-sensitive Bayesian network and weighted K-nearest neighbor , 2019, J. Intell. Transp. Syst..

[20]  Wei Wang,et al.  Multi-objective optimization of urban bus network using cumulative prospect theory , 2015, J. Syst. Sci. Complex..

[21]  Changxi Ma,et al.  Causation Analysis of Hazardous Material Road Transportation Accidents Based on the Ordered Logit Regression Model , 2020, International journal of environmental research and public health.

[22]  Ralph Helmar Rasshofer,et al.  Influences of weather phenomena on automotive laser radar systems , 2011 .

[23]  Shuchisnigdha Deb,et al.  Pedestrians’ Receptivity Toward Fully Automated Vehicles: Research Review and Roadmap for Future Research , 2018, IEEE Transactions on Human-Machine Systems.

[24]  Sanjay Ranka,et al.  Deployment and Testing of Optimized Autonomous and Connected Vehicle Trajectories at a Closed-Course Signalized Intersection , 2018 .

[25]  Tormod Næs,et al.  Understanding the collinearity problem in regression and discriminant analysis , 2001 .

[26]  Daniel J. Fagnant,et al.  Preparing a Nation for Autonomous Vehicles: Opportunities, Barriers and Policy Recommendations , 2015 .

[27]  Neville A. Stanton,et al.  Effects of adaptive cruise control and highly automated driving on workload and situation awareness: A review of the empirical evidence , 2014 .

[28]  Jun Ma,et al.  Analyzing the Leading Causes of Traffic Fatalities Using XGBoost and Grid-Based Analysis: A City Management Perspective , 2019, IEEE Access.

[29]  Xuesong Wang,et al.  Exploring the impacts of speed variances on safety performance of urban elevated expressways using GPS data. , 2019, Accident; analysis and prevention.

[30]  Tamás Bécsi,et al.  Model Based Trajectory Planning for Highly Automated Road Vehicles , 2017 .

[31]  J. Friedman Stochastic gradient boosting , 2002 .

[32]  Donghong Ji,et al.  Novel framework for image attribute annotation with gene selection XGBoost algorithm and relative attribute model , 2019, Appl. Soft Comput..

[33]  Sheng Dong,et al.  A Comparative Study on Drivers’ Stop/Go Behavior at Signalized Intersections Based on Decision Tree Classification Model , 2020, Journal of Advanced Transportation.

[34]  Hao Zhang,et al.  Spatial Analysis of Bikeshare Ridership With Smart Card and POI Data Using Geographically Weighted Regression Method , 2018, IEEE Access.

[35]  Mohammed Quddus,et al.  Speed, speed variation and crash relationships for urban arterials. , 2018, Accident; analysis and prevention.

[36]  Jack Stilgoe,et al.  Machine learning, social learning and the governance of self-driving cars , 2017, Social studies of science.

[37]  Xinhong Wang,et al.  Expressway Crash Prediction based on Traffic Big Data , 2018, SPML '18.

[38]  Behram Wali,et al.  Exploratory analysis of automated vehicle crashes in California: A text analytics & hierarchical Bayesian heterogeneity-based approach. , 2019, Accident; analysis and prevention.

[39]  S Chandra,et al.  Diagnosis of tuberculosis--newer tests. , 1994, The Journal of the Association of Physicians of India.

[40]  Zhixia Li,et al.  Exploring causes and effects of automated vehicle disengagement using statistical modeling and classification tree based on field test data. , 2019, Accident; analysis and prevention.

[41]  Vijay Gadepally,et al.  A Framework for Estimating Long Term Driver Behavior , 2016, ArXiv.

[42]  Charlotte H. Mason,et al.  Collinearity, power, and interpretation of multiple regression analysis. , 1991 .

[43]  Hong Chen,et al.  Data-Driven Real-Time Online Taxi-Hailing Demand Forecasting Based on Machine Learning Method , 2020 .

[44]  Andreas A. Malikopoulos,et al.  A Survey on the Coordination of Connected and Automated Vehicles at Intersections and Merging at Highway On-Ramps , 2017, IEEE Transactions on Intelligent Transportation Systems.

[45]  Ali Movahedi,et al.  Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. , 2019, Accident; analysis and prevention.

[46]  Zhizhen Liu,et al.  Taxi Demand Prediction Based on a Combination Forecasting Model in Hotspots , 2020 .

[47]  Kun Tang,et al.  Exploring spatial variation of the bus stop influence zone with multi-source data: A case study in Zhenjiang, China , 2019, Journal of Transport Geography.

[48]  Youngchan Jang,et al.  Classification of motor vehicle crash injury severity: A hybrid approach for imbalanced data. , 2018, Accident; analysis and prevention.

[49]  Fred L. Mannering,et al.  The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives , 2010 .