Can crowdsourcing create the missing crash data?

UPDATED---June 1, 2020. Road traffic crashes (RTCs) are the primary cause of death among children and young adults. Yet data on RTCs is incomplete, hindering effective road safety policymaking in many developing countries where mortality is purportedly highest. We web-scrape 850,000 tweets to create crash data and develop a machine learning algorithm to geolocate RTCs. Our algorithm is nearly twice as precise as a standard geoparsing algorithm in identifying the set of locations that include the crash location. Above and beyond, it identifies the unique location of a crash from the set of possible locations in a majority of cases. We dispatch a set of motorcycle drivers to the site of the presumed crash in real time to verify the validity of the crowdsourced data and document the performance of the algorithm. The study can be used as a proof of concept for countries interested to improve RTC data at low cost through a machine learning approach and substantially increase the data available to analyze RTCs and prioritize road safety policies.

[1]  Rob Hranac,et al.  Twitter Interactions as a Data Source for Transportation Incidents , 2013 .

[2]  Nikhil S. Dhavase,et al.  Location identification for crime & disaster events by geoparsing Twitter , 2014, International Conference for Convergence for Technology-2014.

[3]  Amit P. Sheth,et al.  Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models , 2017, COLING.

[4]  D. Gática-Pérez,et al.  Social Multimedia, Diversity, and Global South Cities: A Double Blind Side , 2019, Proceedings of the 1st International Workshop on Fairness, Accountability, and Transparency in MultiMedia - FAT/MM '19.

[5]  Yeran Sun,et al.  On fine-grained geolocalisation of tweets and real-time traffic incident detection , 2019, Inf. Process. Manag..

[6]  Hans de Moel,et al.  A global database of historic and real-time flood events based on social media , 2019, Scientific Data.

[7]  Shervin Malmasi,et al.  Location Mention Detection in Tweets and Microblogs , 2015, PACLING.

[8]  Panagiotis Georgakis,et al.  Incident detection using data from social media , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).