A comparative analysis of heterogeneity in road accident data using data mining techniques

Road accidents are one of the most imperative factors that affect the untimely death among people and economic loss of public and private property. Road safety is a term associated with the planning and implementing certain strategy to overcome the road and traffic accidents. Road accident data analysis is a very important means to identify various factors associated with road accidents and can help in reducing the accident rate. The heterogeneity of road accident data is a big challenge in road safety analysis. In this study, we are making use of latent class clustering (LCC) and k-modes clustering technique on a new road accident data from Haridwar, Uttarakhand, India. The main focus to use both the techniques is to identify which technique performs better. Initially, we applied LCC and k-modes clutering technique on road accident data to form different clusters. Further, Frequent Pattern (FP) growth technique is applied on the clusters formed and entire data set (EDS). The rules generated for each clusters do not prove any cluster analysis technique superior over other. However, it is certain that both techniques are well suited to remove heterogeneity of road accident data. The rules generated for each cluster and EDS proves that heterogeneity exists in the entire data set and clustering prior to analysis certainly reduces heterogeneity from the data set and provides better solutions. The rules for Haridwar district reveals some important information which can used to develop policies to prevent and overcome the accident rate.

[1]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[2]  Sarath C. Joshua,et al.  Estimating truck accident rate and involvements using linear and poisson regression models , 1990 .

[3]  Fred Mannering,et al.  Probabilistic models of motorcyclists' injury severities in single- and multi-vehicle crashes. , 2007, Accident; analysis and prevention.

[4]  Griselda López,et al.  nalysis of traffic accidents on rural highways using Latent Class Clustering and ayesian Networks , 2012 .

[5]  A. Raftery A Note on Bayes Factors for Log‐Linear Contingency Table Models with Vague Prior Information , 1986 .

[6]  Sudhir Kumar Barai,et al.  Data mining applications in transportation engineering , 2003 .

[7]  Monica Menendez,et al.  Exploring the application of latent class cluster analysis for investigating pedestrian crash injury severities in Switzerland. , 2015, Accident; analysis and prevention.

[8]  Geert Wets,et al.  Traffic accident segmentation by means of latent class clustering. , 2008, Accident; analysis and prevention.

[9]  G R Wood Confidence and prediction intervals for generalised linear accident models. , 2005, Accident; analysis and prevention.

[10]  Durga Toshniwal,et al.  A data mining framework to analyze road accident data , 2015, Journal of Big Data.

[11]  Sachin Kumar,et al.  A data mining approach to characterize road accident locations , 2016, Journal of Modern Transportation.

[12]  Gudmundur F. Ulfarsson,et al.  Differences in male and female injury severities in sport-utility vehicle, minivan, pickup and passenger car accidents. , 2004, Accident; analysis and prevention.

[13]  H. Akaike Factor analysis and AIC , 1987 .

[14]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[15]  Ahmed E. Radwan,et al.  Modeling traffic accident occurrence and involvement. , 2000, Accident; analysis and prevention.

[16]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[17]  Paul E. Green,et al.  K-modes Clustering , 2001, J. Classif..

[18]  M G Karlaftis,et al.  Heterogeneity considerations in accident modeling. , 1998, Accident; analysis and prevention.

[19]  F. Mannering,et al.  Driver aging and its effect on male and female single-vehicle accident injuries: some additional evidence. , 2006, Journal of safety research.

[20]  J. Vermunt,et al.  Latent class cluster analysis , 2002 .

[21]  K. Vanhoof,et al.  Profiling of High-Frequency Accident Locations by Use of Association Rules , 2003 .

[22]  Paul P Jovanis,et al.  Method for Identifying Factors Contributing to Driver-Injury Severity in Traffic Crashes , 2000 .

[23]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .