Extracting accurate location information from a highly inaccurate traffic accident dataset: A methodology based on a string matching technique

The objective of this research was to develop a model for validating traffic accident locations that would be applicable worldwide, regardless of linguistic or cultural differences. In order to achieve this, a Volunteered Geographic Information (VGI) dataset was used, the OpenStreetMap (OSM) project. To test the developed model, a total of 8550 accidents with fatal or non-fatal injuries that occurred in the City of Zagreb from 2010 to 2014 were evaluated. Traffic accident data was collected using the pen-and-paper method while the traffic accident locations were determined using Global Positioning System (GPS) receivers embedded within police vehicles. This form of data entry invariably introduces errors in both geometric and contextual attributes. To fully counteract these errors, the developed model consists of two key concepts: the Jaro–Winkler string matching technique and the Inverse Distance Weighting method. Over 66% of traffic accident locations were validated, which is an increase of 15% when compared to the classical approach. The model outlined in this paper shows a significant improvement in estimating the correct location of traffic accidents. This in turn results in a drastic decrease in resources needed to estimate the quality of accident locations.

[1]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[2]  Matthew A. Jaro,et al.  Probabilistic linkage of large public health data files. , 1995, Statistics in medicine.

[3]  Becky P Y Loo,et al.  Validating crash locations for quantitative spatial analysis: a GIS-based approach. , 2006, Accident; analysis and prevention.

[4]  S. Bakkali,et al.  About the use of spatial interpolation methods to denoising Moroccan resistivity data phosphate "Disturbances" map , 2008 .

[5]  Pascal Neis,et al.  The Street Network Evolution of Crowdsourced Maps: OpenStreetMap in Germany 2007-2011 , 2011, Future Internet.

[6]  Yanfeng Ouyang,et al.  Correcting erroneous crash locations in transportation safety analysis. , 2009, Accident; analysis and prevention.

[7]  Tessa K Anderson,et al.  Kernel density estimation and K-means clustering to profile road accident hotspots. , 2009, Accident; analysis and prevention.

[8]  L H Nitz,et al.  Spatial analysis of Honolulu motor vehicle crashes: I. Spatial patterns. , 1995, Accident; analysis and prevention.

[9]  David Pitfield,et al.  High accuracy crash mapping using fuzzy logic , 2014 .

[10]  Andrew P Tarko,et al.  Probabilistic Determination of Crash Locations in a Road Network with Imperfect Data , 2009 .

[11]  Mohammed Quddus,et al.  Network-level accident-mapping: Distance based pattern matching using artificial neural network. , 2014, Accident; analysis and prevention.

[12]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[13]  Paul A. Zandbergen,et al.  A comparison of address point, parcel and street geocoding techniques , 2008, Comput. Environ. Urban Syst..

[14]  Ibrahim Yilmaz,et al.  Geographical information systems aided traffic accident analysis system case study: city of Afyonkarahisar. , 2008, Accident; analysis and prevention.

[15]  M. Haklay How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets , 2010 .

[16]  Atef M Garib,et al.  A GIS based traffic accident data collection, referencing and analysis framework for Abu Dhabi , 2004 .

[17]  Li Zhu,et al.  A GIS-based Bayesian approach for analyzing spatial–temporal patterns of intra-city motor vehicle crashes , 2007 .

[18]  Pradeep Ravikumar,et al.  Adaptive Name Matching in Information Integration , 2003, IEEE Intell. Syst..

[19]  Peter Mooney,et al.  Characteristics of Heavily Edited Objects in OpenStreetMap , 2012, Future Internet.