Discovering latent themes in traffic fatal crash narratives using text mining analytics and network topology.

The proliferation of digital textual archives in the transportation safety domain makes it imperative for the inventions of efficient ways of extracting information from the textual data sources. The present study aims at utilizing crash narratives complemented by crash metadata to discern the prevalence and co-occurrence of themes that contribute to crash incidents. Ten years (2009-2018) of Michigan traffic fatal crash narratives were used as a case study. The structural topic modeling (STM) and network topology analysis were used to generate and examine the prevalence and interaction of themes from the crash narratives that were mainly categorized into pre-crash events, crash locations and involved parties in the traffic crashes. The main advantage of the STM over the other topic modeling approaches is that it allows the researchers to discover themes from documents and estimate how the topic relates to the document metadata. Topics with the highest prevalence for the angle, head-on, rear-end, sideswipe and single motor vehicle crashes were crash at stop-sign, crossing the centerline, unable to stop, lane change maneuver and run-off-road crash, respectively. Eigenvector centrality measure in network topology showed that event-related topics were consistently central in articulating the crash occurrence. The centrality and association between topics varied across crash types. The efficacy of generated topics in classifying crashes by type was tested using a machine learning algorithm, Random Forest. The classification accuracy in the held-out sample ranged between 89.3 % for sideswipe crashes to 99.2 % for single motor vehicle crashes. High classification accuracy suggests that automation of crash typing and consistency checks can be accomplished effectively by using extracted latent themes from the crash narratives.

[1]  Yulong Pei,et al.  The Influence of Contributory Factors on Driving Violations at Intersections: An Exploratory Analysis , 2013 .

[2]  Xuesong Wang,et al.  Factors contributing to commercial vehicle rear-end conflicts in China: A study using on-board event data recorders. , 2017, Journal of safety research.

[3]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[4]  John Elder,et al.  Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications , 2012 .

[5]  Michael Sivak,et al.  Body-pillar vision obstructions and lane-change crashes. , 2007, Journal of safety research.

[6]  Satish V. Ukkusuri,et al.  Urban activity pattern classification using topic models from online geo-location data , 2014 .

[7]  Thomas A. Dingus,et al.  Contributing Factors to Run-off-road Crashes and Near-crashes , 2009 .

[8]  Charles V. Trappey,et al.  Knowledge Discovery of Service Satisfaction Based on Text Analysis of Critical Incident Dialogues and Clustering Methods , 2013, 2013 IEEE 10th International Conference on e-Business Engineering.

[9]  Sjaanie Narelle Koppel,et al.  Epidemiology of older driver crashes - Identifying older driver risk factors and exposure patterns , 2006 .

[10]  Carlos Roque,et al.  Topic analysis of Road safety inspections using latent dirichlet allocation: A case study of roadside safety in Irish main roads. , 2019, Accident; analysis and prevention.

[11]  Leigh Metcalf Chapter 5 – Graph theory , 2016 .

[12]  Salissou Moutari,et al.  What are the factors that contribute to road accidents? An assessment of law enforcement views, ordinary drivers' opinions, and road accident records. , 2018, Accident; analysis and prevention.

[13]  Matt Taddy,et al.  On Estimation and Selection for Topic Models , 2011, AISTATS.

[14]  P. V. Marsden,et al.  Network Centrality, Measures of , 2015 .

[15]  Marco Dozza,et al.  Safer Glances, Driver Inattention, and Crash Risk: An Investigation Using the SHRP 2 Naturalistic Driving Study , 2013 .

[16]  Chiara Orsi,et al.  Car crashes: The effect of passenger presence and other factors on driver outcome , 2013 .

[17]  Kenneth Kuhn,et al.  Using structural topic modeling to identify latent topics and trends in aviation incident reports , 2018 .

[18]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[19]  Kerrie L Schattler,et al.  Driver Behavior Characteristics at Urban Signalized Intersections , 2004 .

[20]  Lisa Keay,et al.  Predictors of Lane‐Change Errors in Older Drivers , 2010, Journal of the American Geriatrics Society.

[21]  Edoardo M. Airoldi,et al.  Summarizing topical content with word frequency and exclusivity , 2012, ICML 2012.

[22]  Michael Lees,et al.  Analysis of publication activity of computational science society in 2001-2017 using topic modelling and graph theory , 2018, J. Comput. Sci..

[23]  Bevan B. Kirley,et al.  Crashes of novice teenage drivers: characteristics and contributing factors. , 2008, Journal of safety research.

[24]  Samaneh Beheshti-Kashi,et al.  Textual Data in Transportation Research: Techniques and Opportunities , 2019, Mobility Patterns, Big Data and Transport Analytics.

[25]  George C. Banks,et al.  A Review of Best Practice Recommendations for Text Analysis in R (and a User-Friendly App) , 2018 .

[26]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[27]  J. Golbeck Network Structure and Measures , 2013 .

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Marie-Luce Taupin,et al.  RKHSMetaMod : An R package to estimate the Hoeffding decomposition of an unknown function by solving RKHS Ridge Group Sparse optimization problem , 2019, ArXiv.

[30]  Margaret E. Roberts,et al.  stm: An R Package for Structural Topic Models , 2019, Journal of Statistical Software.

[31]  Tahir Ekin,et al.  Topic modelling for medical prescription fraud and abuse detection , 2018, Journal of the Royal Statistical Society: Series C (Applied Statistics).

[32]  Peter E.D. Love,et al.  Deep learning and network analysis: Classifying and visualizing accident narratives in construction , 2020 .

[33]  Abhisek Mudgal,et al.  Vehicle Consumer Complaint Reports Involving Severe Incidents: Mining Large Contingency Tables , 2018, Transportation Research Record: Journal of the Transportation Research Board.

[34]  David M. Blei,et al.  Surveying a suite of algorithms that offer a solution to managing large document archives. , 2012 .

[35]  Jules J. Berman Providing Structure to Unstructured Data , 2013 .

[36]  Donald E. Brown,et al.  Text Mining the Contributors to Rail Accidents , 2016, IEEE Transactions on Intelligent Transportation Systems.

[37]  G McGwin,et al.  Characteristics of traffic crashes among young, middle-aged, and older drivers. , 1999, Accident; analysis and prevention.

[38]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[39]  Mark Woodward,et al.  Passenger carriage and car crash injury: a comparison between younger and older drivers. , 2003, Accident; analysis and prevention.

[40]  Margaret E. Roberts,et al.  A Model of Text for Experimentation in the Social Sciences , 2016 .

[41]  A.V.Bakshi,et al.  Network Analysis , 2005, Operations Research.

[42]  S. D. Robinson Temporal topic modeling applied to aviation safety reports: A subject matter expert review , 2019, Safety Science.

[43]  Mohammad Jalayer,et al.  YouTube as a Source of Information in Understanding Autonomous Vehicle Consumers: Natural Language Processing Study , 2019, Transportation Research Record: Journal of the Transportation Research Board.

[44]  A. Williams,et al.  Prevalence and characteristics of red light running crashes in the United States. , 1999, Accident; analysis and prevention.

[45]  David Gillen,et al.  Measuring the impact of passenger restrictions on new teenage drivers. , 2005, Accident; analysis and prevention.

[46]  David G. Rand,et al.  Structural Topic Models for Open‐Ended Survey Responses , 2014, American Journal of Political Science.

[47]  Xiaoduan Sun,et al.  Text Mining and Topic Modeling of Compendiums of Papers from Transportation Research Board Annual Meetings , 2016 .

[48]  William J Horrey,et al.  Age-related differences in fatal intersection crashes in the United States. , 2017, Accident; analysis and prevention.

[49]  Valerian Kwigizile,et al.  Semantic N-Gram Feature Analysis and Machine Learning-Based Classification of Drivers' Hazardous Actions at Signal-Controlled Intersections , 2020, J. Comput. Civ. Eng..

[50]  Marco Dozza,et al.  Initial Analyses from the SHRP 2 Naturalistic Driving Study - Addressing Driver Performance and Behavior in Traffic Safety , 2013 .

[51]  Johnathon P Ehsani,et al.  Teen Drivers' Perceptions of Their Peer Passengers , 2015, Transportation research record.