Clustering and profiling traffic roads by means of accident data

The identification of geographical locations with high accident risk by means of clustering techniques and profiling them in terms of accident related data and location characteristics by means of data mining techniques must therefore provide valuable input for government actions towards traffic safety. In the first part of this research, an innovative method based on latent class clustering (also called model-based clustering or finite mixture modelling) is used to cluster traffic roads into distinct groups based on their similar accident frequencies. The data that will be used are obtained from the Belgian "Analysis Form for Traffic Accidents" that should be filled out by a police officer for each traffic accident that occurs with killed or seriously injured casualties on a public road in Belgium. More specifically, this analysis will focus on 19 central roads of the city of Hasselt for 3 consecutive time periods of 1992-4, 1995-7 and 1998-2000. The observed accident frequencies are assumed to originate from a mixture of density distributions for which the parameters of the distribution, the size and the number of segments are unknown. It is the objective of latent class clustering to 'unmix' the distributions and to find the optimal parameters of the distributions and the number and size of the segments, given the underlying data. The development and use of the model is described. In the second part of this study, the data mining technique of association rules is used to profile each cluster of traffic roads in terms of the available traffic accident data. The strength of this approach lies within the identification of relevant variables that make a strong contribution towards a better understanding of the accident circumstances for each group of traffic roads. Since the clusters show different results for the overall accident 'risk' on the roads, one could expect that not every accident variable will be of equal importance when describing the different groups of traffic roads. Therefore, a comparative analysis between the accident characteristics of the different clusters is conducted, which provides new insights into the complexity and causes of road accidents. For the covering abstract see ITRD E126595.

[1]  A. Cameron,et al.  Econometric models based on count data. Comparisons and applications of some estimators and tests , 1986 .

[2]  Jye-Chyi Lu,et al.  Multivariate Zero-Inflated Poisson Models and Their Applications , 1999, Technometrics.

[3]  Wing-Gun Wong,et al.  An algorithm for assessing the risk of traffic accident. , 2002, Journal of safety research.

[4]  D. Karlis An EM algorithm for multivariate Poisson distribution and related models , 2003 .

[5]  K. Land,et al.  A Comparison of Poisson, Negative Binomial, and Semiparametric Mixed Poisson Regression Models , 1996 .

[6]  Usama M. Fayyad,et al.  Knowledge Discovery in Databases: An Overview , 1997, ILP.

[7]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[8]  Heikki Mannila,et al.  Theoretical frameworks for data mining , 2000, SKDD.

[9]  Chris Lee,et al.  Analysis of Crash Precursors on Instrumented Freeways , 2002 .

[10]  Paul P Jovanis,et al.  Method for Identifying Factors Contributing to Driver-Injury Severity in Traffic Crashes , 2000 .

[11]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[12]  Tom Brijs,et al.  Profiling high frequency accident locations using associations rules , 2002 .

[13]  M H Cameron ACCIDENT DATA ANALYSIS TO DEVELOP TARGET GROUPS FOR COUNTERMEASURES, VOLUME 2: ANALYSIS REPORTS , 1992 .

[14]  Bruce N. Janson,et al.  Diagnostic Methodology for the Detection of Safety Problems at Intersections , 2002 .

[15]  Michel Wedel,et al.  A Latent Class Poisson Regression Model for Heterogeneous Count Data , 1993 .

[16]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[17]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[18]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[19]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[20]  Geert Wets,et al.  A Bayesian model for ranking hazardous sites , 2003 .

[21]  Jerome H. Friedman,et al.  DATA MINING AND STATISTICS: WHAT''S THE CONNECTION , 1997 .