Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees.

Factor complexity is a characteristic of traffic crashes. This paper proposes a novel method, namely boosted regression trees (BRT), to investigate the complex and nonlinear relationships in high-variance traffic crash data. The Taiwanese 2004-2005 single-vehicle motorcycle crash data are used to demonstrate the utility of BRT. Traditional logistic regression and classification and regression tree (CART) models are also used to compare their estimation results and external validities. Both the in-sample cross-validation and out-of-sample validation results show that an increase in tree complexity provides improved, although declining, classification performance, indicating a limited factor complexity of single-vehicle motorcycle crashes. The effects of crucial variables including geographical, time, and sociodemographic factors explain some fatal crashes. Relatively unique fatal crashes are better approximated by interactive terms, especially combinations of behavioral factors. BRT models generally provide improved transferability than conventional logistic regression and CART models. This study also discusses the implications of the results for devising safety policies.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Rune Elvik,et al.  The importance of confounding in observational before-and-after studies of road safety measures. , 2002, Accident; analysis and prevention.

[3]  Geert Wets,et al.  Understanding spatial concentrations of road accidents using frequent item sets. , 2005, Accident; analysis and prevention.

[4]  Konstantina Gkritza,et al.  Age and gender differences in conviction and crash occurrence subsequent to being directed to Iowa's driver improvement program. , 2011, Journal of safety research.

[5]  Mohamed Abdel-Aty,et al.  Market basket analysis of crash data from large jurisdictions and its potential as a decision support tool , 2009 .

[6]  James T. Reason,et al.  Managing the risks of organizational accidents , 1997 .

[7]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[8]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[9]  G. Guyatt,et al.  The independent contribution of driver, crash, and vehicle characteristics to driver fatalities. , 2002, Accident; analysis and prevention.

[10]  W. Loh,et al.  Logistic Regression Tree Analysis , 2006 .

[11]  Menghui H. Zhang,et al.  Application of boosting to classification problems in chemometrics , 2005 .

[12]  L Quine,et al.  Age and experience in motorcycling safety. , 1996, Accident; analysis and prevention.

[13]  Shu-Hui Chang,et al.  A longitudinal study of risk factors for motorcycle crashes among junior college students in Taiwan. , 2003, Accident; analysis and prevention.

[14]  Lorenzo Mussone,et al.  THE ANALYSIS OF MOTOR VEHICLE CRASH CLUSTERS USING THE VECTORS QUANTIZATION TECHNIQUE , 2010 .

[15]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[16]  Li-Yen Chang,et al.  Analysis of traffic injury severity: an application of non-parametric classification tree techniques. , 2006, Accident; analysis and prevention.

[17]  Sveinung Eiksund,et al.  A geographical perspective on driving attitudes and behaviour among young adults in urban and rural Norway , 2009 .

[18]  Geert Wets,et al.  Traffic accident segmentation by means of latent class clustering. , 2008, Accident; analysis and prevention.

[19]  G. De’ath Boosted trees for ecological modeling and prediction. , 2007, Ecology.

[20]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[21]  Yi-Shih Chung,et al.  Rough set approach for accident chains exploration. , 2007, Accident; analysis and prevention.

[22]  W. Loh,et al.  LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees , 2004 .

[23]  Silvio Brusaferro,et al.  Risk factors for fatal road traffic accidents in Udine, Italy. , 2002, Accident; analysis and prevention.

[24]  Ahmad Mirabadi,et al.  Application of Association Rules in Iranian Railways (RAI) Accident Data Analysis , 2010 .

[25]  A. S. Al-Ghamdi Using logistic regression to estimate the influence of accident factors on accident severity. , 2002, Accident; analysis and prevention.

[26]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[27]  Michael E Rakauskas,et al.  Identification of differences between rural and urban safety cultures. , 2009, Accident; analysis and prevention.

[28]  J Wayne Meredith,et al.  Non-helmeted motorcyclists: a burden to society? A study using the National Trauma Data Bank. , 2004, The Journal of trauma.

[29]  Ming–Der Li,et al.  Differences in urban and rural accident characteristics and medical service utilization for traffic fatalities in less-motorized societies. , 2008, Journal of safety research.

[30]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[31]  Jennifer T. Wong,et al.  Comparison of Methodology Approach to Identify Causal Factors of Accident Severity , 2008 .

[32]  Bruno Agard,et al.  Mining Microscopic Data of Vehicle Conflicts and Collisions to Investigate Collision Factors , 2011 .

[33]  T. Golob,et al.  A Method for Relating Type of Crash to Traffic Flow Characteristics on Urban Freeways , 2002 .

[34]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[35]  Xuedong Yan,et al.  Classification analysis of driver's stop/go decision and red-light running violation. , 2010, Accident; analysis and prevention.

[36]  Manuela Bina,et al.  Risky driving and lifestyles in adolescence. , 2006, Accident; analysis and prevention.

[37]  Stephen M Cohn,et al.  The impact of a repealed motorcycle helmet law in Miami-Dade County. , 2002, The Journal of trauma.

[38]  Jay Liebowitz,et al.  Older drivers and accidents: A meta analysis and data mining application on traffic accident data , 2005, Expert Syst. Appl..

[39]  D Fleury,et al.  Accident prototypical scenarios, a tool for road safety research and diagnostic studies. , 2001, Accident; analysis and prevention.

[40]  Yi-Shih Chung,et al.  Analyzing heterogeneous accident data from the perspective of accident occurrence. , 2008, Accident; analysis and prevention.

[41]  Eric Yamashita,et al.  Hit-and-Run Crashes , 2008 .

[42]  D. Yagil,et al.  Instrumental and normative motives for compliance with traffic laws among young and older drivers. , 1998, Accident; analysis and prevention.

[43]  Xuedong Yan,et al.  Exploring precrash maneuvers using classification trees and random forests. , 2009, Accident; analysis and prevention.

[44]  Hsin-Li Chang,et al.  Motorcyclist accident involvement by age, gender, and risky behaviors in Taipei, Taiwan , 2007 .

[45]  Shu-Hui Chang,et al.  Factors associated with severity of motorcycle injuries among young adult riders. , 2003, Annals of emergency medicine.

[46]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[47]  Alfonso Montella,et al.  Identifying crash contributory factors at urban roundabouts and using association rules to explore their relationships to different crash types. , 2011, Accident; analysis and prevention.

[48]  Rune Elvik,et al.  Assessing the validity of road safety evaluation studies by analysing causal chains. , 2003, Accident; analysis and prevention.

[49]  Antonio D’Ambrosio,et al.  Analysis of powered two-wheeler crashes in Italy by classification trees and rules discovery. , 2012, Accident; analysis and prevention.