Clustering Approach toward Large Truck Crash Analysis

Heterogeneity of crash data masks the underlying crash patterns and perplexes crash analysis. This paper aims to explore an advanced high-dimensional clustering approach to investigate heterogeneity in large datasets. Detailed records of crashes involving large trucks occurring in the state of Florida between 2007 and 2016 were examined to identify truck crash patterns and significant conditions contributing to the patterns. The block clustering method was applied to more than 220,000 crash records with nearly 200 attributes. The analysis showed promising results in segmenting a large heterogeneous dataset into meaningful subgroups (with 95.72% average degree of homogeneity for selected blocks). The goodness of fit for clustering methods is evaluated and both integrated completed likelihood (ICL) and pseudo-likelihood values improved significantly (20.8% and 21.1% respectively). Attribute clustering showed distinct characteristics for each cluster. Crash clustering revealed significant differences among the clusters and suggested that this crash dataset could be portioned as same-direction, opposing-direction, and single-vehicle crashes. Individual blocks defined by both row and column clustering were further investigated to better understand the contribution set of conditions that lead to large truck crashes. Major features for each of the three major types of crashes were analyzed, which may provide additional insights to develop potential countermeasures and strategies that target specific segments. The clustering approach could be used as a preanalysis method to identify homogeneous subgroups for further analysis, which will help enhance the effectiveness of safety programs.

[1]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[3]  Vipin Kumar,et al.  The Challenges of Clustering High Dimensional Data , 2004 .

[4]  Geert Wets,et al.  Traffic accident segmentation by means of latent class clustering. , 2008, Accident; analysis and prevention.

[5]  Xiaoyue Cathy Liu,et al.  Impact of roadway geometric features on crash severity on rural two-lane highways. , 2018, Accident; analysis and prevention.

[6]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[7]  Xia Jin,et al.  Incorporating attitudinal aspects in roadway pricing analysis , 2017 .

[8]  Peter T. Savolainen,et al.  Mixed logit analysis of bicyclist injury severity resulting from motor vehicle crashes at intersection and non-intersection locations. , 2011, Accident; analysis and prevention.

[9]  Duncan P. Brown,et al.  Efficient functional clustering of protein sequences using the Dirichlet process , 2008, Bioinform..

[10]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[11]  Silvio Brusaferro,et al.  Risk factors for fatal road traffic accidents in Udine, Italy. , 2002, Accident; analysis and prevention.

[12]  Konstantina Gkritza,et al.  A mixed logit analysis of two-vehicle crash severities involving a motorcycle. , 2013, Accident; analysis and prevention.

[13]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[14]  J. Vermunt,et al.  Latent class cluster analysis , 2002 .

[15]  Kirolos Haleem,et al.  Effect of driver's age and side of impact on crash severity along urban freeways: a mixed logit approach. , 2013, Journal of safety research.

[16]  Mario De Luca,et al.  Using a K-Means Clustering Algorithm to Examine Patterns of Vehicle Crashes in Before-After Analysis , 2013 .

[17]  R. Hathaway Another interpretation of the EM algorithm for mixture distributions , 1986 .

[18]  Priyanka Alluri,et al.  Evaluating Factors Influencing the Severity of Three-Plus Multiple-Vehicle Crashes using Real-Time Traffic Data , 2018, Transportation Research Record: Journal of the Transportation Research Board.

[19]  Srinivas S Pulugurtha,et al.  Predictability and interpretability of hybrid link-level crash frequency models for urban arterials compared to cluster-based and general negative binomial regression models , 2018, International journal of injury control and safety promotion.

[20]  Somaye Fakharian Qom,et al.  Framework for Multi-Resolution Analyses of Advanced Traffic Management Strategies , 2016 .

[21]  Nial Friel,et al.  Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion , 2014, METRON.

[22]  G R Wood Confidence and prediction intervals for generalised linear accident models. , 2005, Accident; analysis and prevention.

[23]  Kelvin K W Yau,et al.  Risk factors affecting the severity of single vehicle traffic accidents in Hong Kong. , 2004, Accident; analysis and prevention.

[24]  Gérard Govaert,et al.  blockcluster: An R Package for Model Based Co-Clustering , 2017 .

[25]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[26]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[27]  Fred L Mannering,et al.  An empirical assessment of fixed and random parameter logit models using crash- and non-crash-specific injury data. , 2011, Accident; analysis and prevention.

[28]  Asad J. Khattak,et al.  A Framework to Process and Analyze Driver, Vehicle and Road infrastructure Volatilities in Real-time , 2018 .

[29]  Qiong Wu,et al.  Mixed logit model-based driver injury severity investigations in single- and multi-vehicle crashes on rural two-lane highways. , 2014, Accident; analysis and prevention.

[30]  Konstantina Gkritza,et al.  A comparison of the mixed logit and latent class methods for crash severity analysis , 2014 .

[31]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[32]  John N. Ivan,et al.  Collision Type Categorization Based on Crash Causality and Severity Analysis , 2007 .

[33]  Gudmundur F. Ulfarsson,et al.  Differences in male and female injury severities in sport-utility vehicle, minivan, pickup and passenger car accidents. , 2004, Accident; analysis and prevention.

[34]  Philippe Nitsche,et al.  Pre-crash scenarios at road junctions: A clustering method for car crash data. , 2017, Accident; analysis and prevention.

[35]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[36]  Gérard Govaert,et al.  Block clustering with Bernoulli mixture models: Comparison of different approaches , 2008, Comput. Stat. Data Anal..

[37]  Jiuping Xu,et al.  A generalized nonlinear model-based mixed multinomial logit approach for crash data analysis. , 2017, Accident; analysis and prevention.

[38]  Mohamed Ahmed,et al.  A Tree-Based Ordered Probit Approach to Identify Factors Affecting Work Zone Weather-Related Crashes Severity in North Carolina Using the Highway Safety Information System Dataset , 2017 .

[39]  Jerry Wekezer,et al.  Evaluation of Traffic Crash Fatality Causes and Effects: A Study of Fatal Traffic Crashes in Florida from 1998-2000 Focusing on Heavy Truck Crashes , 2005 .

[40]  Amirfarrokh Iranitalab,et al.  Comparison of four statistical and machine learning methods for crash severity prediction. , 2017, Accident; analysis and prevention.

[41]  Monica Menendez,et al.  Exploring the application of latent class cluster analysis for investigating pedestrian crash injury severities in Switzerland. , 2015, Accident; analysis and prevention.

[42]  Tessa K Anderson,et al.  Kernel density estimation and K-means clustering to profile road accident hotspots. , 2009, Accident; analysis and prevention.

[43]  S. Motamedi,et al.  OLDER ADULT DRIVERS ’ CHALLENGES AND IN-VEHICLE TECHNOLOGY ACCEPTANCE , 2017 .

[44]  Joydeep Ghosh,et al.  Data Clustering Algorithms And Applications , 2013 .

[45]  F. Mannering,et al.  Driver aging and its effect on male and female single-vehicle accident injuries: some additional evidence. , 2006, Journal of safety research.

[46]  Philip S. Yu,et al.  Collaborative Co-clustering across Multiple Social Media , 2016, 2016 17th IEEE International Conference on Mobile Data Management (MDM).

[47]  Satish V. Ukkusuri,et al.  A clustering regression approach: A comprehensive injury severity analysis of pedestrian-vehicle cr , 2013 .

[48]  G. Celeux,et al.  Assessing a Mixture Model for Clustering with the Integrated Classification Likelihood , 1998 .

[49]  M. S. Hossan,et al.  Value of Reliability for Road Freight Transportation , 2017 .

[50]  Fred L Mannering,et al.  Highway accident severities and the mixed logit model: an exploratory empirical analysis. , 2008, Accident; analysis and prevention.