Applying machine learning and geolocation techniques to social media data (Twitter) to develop a resource for urban planning.

With all the recent attention focused on big data, it is easy to overlook that basic vital statistics remain difficult to obtain in most of the world. This project set out to test whether an openly available dataset (Twitter) could be transformed into a resource for urban planning and development. The hypothesis is tested by creating road traffic crash location data, which are scarce in most resource-poor environments but essential for addressing the number one cause of mortality for children over age five and young adults. The research project scraped 874,588 traffic-related tweets in Nairobi, Kenya, applied a machine learning model to capture the occurrence of a crash, and developed an improved geoparsing algorithm to identify its location. The project geolocated 32,991 crash reports in Twitter for 2012-20 and clustered them into 22,872 unique crashes to produce one of the first crash maps for Nairobi. A motorcycle delivery service was dispatched in real-time to verify a subset of crashes, showing 92 percent accuracy. Using a spatial clustering algorithm, portions of the road network (less than 1 percent) were identified where 50 percent of the geolocated crashes occurred. Even with limitations in the representativeness of the data, the results can provide urban planners useful information to target road safety improvements where resources are limited.

[1]  Bernd Resch,et al.  Spatial crime distribution and prediction for sporting events using social media , 2020, Int. J. Geogr. Inf. Sci..

[2]  Hans de Moel,et al.  A global database of historic and real-time flood events based on social media , 2019, Scientific Data.

[3]  Margaret L. Kern,et al.  Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods , 2019, Proceedings of the National Academy of Sciences.

[4]  Alan M. MacEachren,et al.  GeoTxt: A scalable geoparsing system for unstructured text geolocation , 2019, Trans. GIS.

[5]  P. Cudré-Mauroux,et al.  CrimeTelescope: crime hotspot prediction based on urban and social media data fusion , 2018, World Wide Web.

[6]  R. Sampson,et al.  Urban mobility and neighborhood isolation in America’s 50 largest cities , 2018, Proceedings of the National Academy of Sciences.

[7]  M. V. Sangameswar,et al.  An algorithm for identification of natural disaster affected area , 2017, Journal of Big Data.

[8]  Amit P. Sheth,et al.  Location Name Extraction from Targeted Text Streams using Gazetteer-based Statistical Language Models , 2017, COLING.

[9]  M. Strube,et al.  Citizen-Centric Urban Planning through Extracting Emotion Information from Twitter in an Interdisciplinary Space-Time-Linguistics Algorithm , 2016 .

[10]  Feng Chen,et al.  From Twitter to detector: real-time traffic incident detection using social media data , 2016 .

[11]  Jinwei Hao,et al.  The rise of big data on urban studies and planning practices in China: Review and open research issues , 2015 .

[12]  Gabriel Cadamuro,et al.  Predicting poverty and wealth from mobile phone metadata , 2015, Science.

[13]  Alexander Zipf,et al.  Twitter as an indicator for whereabouts of people? Correlating Twitter with UK census data , 2015, Comput. Environ. Urban Syst..

[14]  Tariro Makwasha,et al.  Guide to road safety: part 8: treatment of crash locations , 2015 .

[15]  Jie Yin,et al.  Using Social Media to Enhance Emergency Situation Awareness: Extended Abstract , 2015, IJCAI.

[16]  Shervin Malmasi,et al.  Location Mention Detection in Tweets and Microblogs , 2015, PACLING.

[17]  A. Dabalen,et al.  Data Deprivation: Another Deprivation to End , 2015 .

[18]  Patrick Meier,et al.  Digital Humanitarians: How Big Data Is Changing the Face of Humanitarian Response , 2015 .

[19]  Jonathan Levin,et al.  Economics in the age of big data , 2014, Science.

[20]  Matthew S. Gerber,et al.  Predicting crime using Twitter and kernel density estimation , 2014, Decis. Support Syst..

[21]  Nikhil S. Dhavase,et al.  Location identification for crime & disaster events by geoparsing Twitter , 2014, International Conference for Convergence for Technology-2014.

[22]  Stuart E. Middleton,et al.  Real-Time Crisis Mapping of Natural Disasters Using Social Media , 2014, IEEE Intelligent Systems.

[23]  Michael Batty,et al.  Big data, smart cities and city planning , 2013, Dialogues in human geography.

[24]  Judith Gelernter,et al.  An algorithm for local geoparsing of microtext , 2013, GeoInformatica.

[25]  R. Kitchin,et al.  The real-time city? Big data and smart urbanism , 2013, GeoJournal.

[26]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[27]  ChengXiang Zhai,et al.  Mining Text Data , 2012, Springer US.

[28]  G. Miller Sociology. Social scientists wade into the tweet stream. , 2011, Science.

[29]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[30]  Mark J. Embrechts,et al.  On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification , 2009, ICANN.

[31]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[32]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[33]  Sarah Williams Data Action , 2020 .

[34]  Stephen Graham Ritchie,et al.  TRANSPORTATION RESEARCH. PART C, EMERGING TECHNOLOGIES , 1993 .