Comparison and Validation of Injury Risk Classifiers for Advanced Automated Crash Notification Systems

Objective: The odds of death for a seriously injured crash victim are drastically reduced if he or she received care at a trauma center. Advanced automated crash notification (AACN) algorithms are postcrash safety systems that use data measured by the vehicles during the crash to predict the likelihood of occupants being seriously injured. The accuracy of these models are crucial to the success of an AACN. The objective of this study was to compare the predictive performance of competing injury risk models and algorithms: logistic regression, random forest, AdaBoost, naïve Bayes, support vector machine, and classification k-nearest neighbors. Methods: This study compared machine learning algorithms to the widely adopted logistic regression modeling approach. Machine learning algorithms have not been commonly studied in the motor vehicle injury literature. Machine learning algorithms may have higher predictive power than logistic regression, despite the drawback of lacking the ability to perform statistical inference. To evaluate the performance of these algorithms, data on 16,398 vehicles involved in non-rollover collisions were extracted from the NASS-CDS. Vehicles with any occupants having an Injury Severity Score (ISS) of 15 or greater were defined as those requiring victims to be treated at a trauma center. The performance of each model was evaluated using cross-validation. Cross-validation assesses how a model will perform in the future given new data not used for model training. The crash ΔV (change in velocity during the crash), damage side (struck side of the vehicle), seat belt use, vehicle body type, number of events, occupant age, and occupant sex were used as predictors in each model. Results and Conclusions: Logistic regression slightly outperformed the machine learning algorithms based on sensitivity and specificity of the models. Previous studies on AACN risk curves used the same data to train and test the power of the models and as a result had higher sensitivity compared to the cross-validated results from this study. Future studies should account for future data; for example, by using cross-validation or risk presenting optimistic predictions of field performance. Past algorithms have been criticized for relying on age and sex, being difficult to measure by vehicle sensors, and inaccuracies in classifying damage side. The models with accurate damage side and including age/sex did outperform models with less accurate damage side and without age/sex, but the differences were small, suggesting that the success of AACN is not reliant on these predictors.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[2]  Dominique Lord,et al.  The statistical analysis of highway crash-injury severities: a review and assessment of methodological alternatives. , 2011, Accident; analysis and prevention.

[3]  Mithat Gonen,et al.  Analyzing Receiver Operating Characteristic Curves with SAS , 2007 .

[4]  Fred L. Mannering,et al.  The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives , 2010 .

[5]  Carol A C Flannagan,et al.  Identification and validation of a logistic regression model for predicting serious injuries associated with motor vehicle crashes. , 2011, Accident; analysis and prevention.

[6]  Kristan Staudenmayer,et al.  The cost of overtriage: more than one-third of low-risk injured patients were taken to major trauma centers. , 2013, Health affairs.

[7]  John Hinch,et al.  Event Data Recorders: A Decade of Innovation , 2008 .

[8]  S. Henderson,et al.  The pitfalls of potassium replacement in thyrotoxic periodic paralysis: a case report and review of the literature. , 2004, The Journal of emergency medicine.

[9]  Mark Faul,et al.  Large Cost Savings Realized from the 2006 Field Triage Guideline: Reduction in Overtriage in U.S. Trauma Centers , 2012, Prehospital emergency care : official journal of the National Association of EMS Physicians and the National Association of State EMS Directors.

[10]  Christopher D Mack,et al.  Predicting severe injury using vehicle telemetry data , 2013, The journal of trauma and acute care surgery.

[11]  A. S. Al-Ghamdi Using logistic regression to estimate the influence of accident factors on accident severity. , 2002, Accident; analysis and prevention.

[12]  Hampton C. Gabler,et al.  Validation of Event Data Recorders in Side-Impact Crash Tests , 2014 .

[13]  David Gomez,et al.  The mortality benefit of direct trauma center transport in a regional trauma system: A population-based analysis , 2012, The journal of trauma and acute care surgery.

[14]  Hampton C. Gabler,et al.  Automated crash notification: Evaluation of in-vehicle principal direction of force estimations , 2013 .

[15]  Mj Kuiken,et al.  38TH ANNUAL PROCEEDINGS - ASSOCIATION FOR THE ADVANCEMENT OF AUTOMOTIVE MEDICINE , 1994 .

[16]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[17]  John Hinch,et al.  Validation of Event Data Recorders in High Severity Full‑Frontal Crash Tests , 2013 .

[18]  Lawrence H Brown,et al.  Guidelines for Field Triage of Injured Patients: Recommendations of the National Expert Panel on Field Triage , 2009, Pediatrics.

[19]  George Bahouth,et al.  Development of URGENCY 2.1 for the prediction of crash injury severity , 2004 .

[20]  D. Hall Analyzing Receiver Operating Characteristic Curves With SAS , 2008 .

[21]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[22]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[23]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[24]  Daniel O Scharfstein,et al.  A national evaluation of the effect of trauma-center care on mortality. , 2006, The New England journal of medicine.

[25]  S. Garavaglia,et al.  A SMART GUIDE TO DUMMY VARIABLES : FOUR APPLICATIONS AND A MACRO , 1998 .

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[28]  W. Haddon,et al.  The injury severity score: a method for describing patients with multiple injuries and evaluating emergency care. , 1974, The Journal of trauma.

[29]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.