On fine-grained geolocalisation of tweets and real-time traffic incident detection

Recently, geolocalisation of tweets has become important for a wide range of real-time applications, including real-time event detection, topic detection or disaster and emergency analysis. However, the number of relevant geotagged tweets available to enable such tasks remains insufficient. To overcome this limitation, predicting the location of non-geotagged tweets, while challenging, can increase the sample of geotagged data and has consequences for a wide range of applications. In this paper, we propose a location inference method that utilises a ranking approach combined with a majority voting of tweets, where each vote is weighted based on evidence gathered from the ranking. Using geotagged tweets from two cities, Chicago and New York (USA), our experimental results demonstrate that our method (statistically) significantly outperforms state-of-the-art baselines in terms of accuracy and error distance, in both cities, with the cost of decreased coverage. Finally, we investigated the applicability of our method in a real-time scenario by means of a traffic incident detection task. Our analysis shows that our fine-grained geolocalisation method can overcome the limitations of geotagged tweets and precisely map incident-related tweets at the real location of the incident.

[1]  D. Gavin 1 K 1 D : Multivariate Ripley ’ s K-function for one-dimensional data , 2010 .

[2]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[3]  Lior Rokach,et al.  Pattern Classification Using Ensemble Methods , 2009, Series in Machine Perception and Artificial Intelligence.

[4]  Axel Schulz,et al.  Semantic Abstraction for generalization of tweet classification: An evaluation of incident-related tweets , 2016, Semantic Web.

[5]  Avrim Blum,et al.  On-line Algorithms in Machine Learning , 1996, Online Algorithms.

[6]  Shou-De Lin,et al.  A Ranking-based KNN Approach for Multi-Label Classification , 2012, ACML.

[7]  Rungang Han,et al.  On robust truth discovery in sparse social media sensing , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[8]  Mor Naaman,et al.  CityBeat: real-time social media visualization of hyper-local city data , 2014, WWW.

[9]  Xiaozhong Liu,et al.  Mirroring the real world in social media: twitter, geolocation, and sentiment analysis , 2013, UnstructureNLP@CIKM.

[10]  Jan O. Pedersen,et al.  Space Optimizations for Total Ranking , 1997, RIAO.

[11]  Rob Hranac,et al.  Twitter Interactions as a Data Source for Transportation Incidents , 2013 .

[12]  Themis Palpanas,et al.  Fine-grained geolocalisation of non-geotagged tweets , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[13]  Scott A. Hale,et al.  Where in the World Are You? Geolocation and Language Identification in Twitter* , 2013, ArXiv.

[14]  Peng Zhang,et al.  Estimating the Locations of Emergency Events from Twitter Streams , 2014, ITQM.

[15]  Sarah Vieweg,et al.  Processing Social Media Messages in Mass Emergency , 2014, ACM Comput. Surv..

[16]  Roger Bivand,et al.  Implementing functions for spatial statistical analysis using the language , 2000, J. Geogr. Syst..

[17]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[18]  Dongwon Lee,et al.  @Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[19]  Dong Wang,et al.  Hardness-Aware Truth Discovery in Social Sensing Applications , 2016, 2016 International Conference on Distributed Computing in Sensor Systems (DCOSS).

[20]  Zhen Qian,et al.  Real-time incident detection using social media data. , 2016 .

[21]  Craig MacDonald,et al.  EAIMS: Emergency Analysis Identification and Management System , 2016, SIGIR.

[22]  Anthony Stefanidis,et al.  #Earthquake: Twitter as a Distributed Sensor System , 2013, Trans. GIS.

[23]  Bernard W. Silverman,et al.  Methods for Analysing Spatial Processes of Several Types of Points , 1982 .

[24]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[25]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[26]  Max Mühlhäuser,et al.  A Multi-Indicator Approach for Geolocalization of Tweets , 2013, ICWSM.

[27]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[28]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[29]  Yaron Kanza,et al.  On the Correlation Between Textual Content and Geospatial Locations in Microblogs , 2014, GeoRich'14.

[30]  Peter Willett,et al.  A review of the use of inverted files for best match searching in information retrieval systems , 1983 .

[31]  David Mahalel,et al.  Estimating the number of accidents at intersections from a knowledge of the traffic flows on the approaches , 1978 .

[32]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[33]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[34]  Axel Schulz,et al.  I See a Car Crash: Real-Time Detection of Small Scale Incidents in Microblogs , 2013, ESWC.

[35]  Joemon M. Jose,et al.  On Fine-Grained Geolocalisation of Tweets , 2017, ICTIR.

[36]  Joemon M. Jose,et al.  On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval , 2015, ICTIR.

[37]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[38]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[39]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..

[40]  Eleonora D'Andrea,et al.  Real-Time Detection of Traffic From Twitter Stream Analysis , 2015, IEEE Transactions on Intelligent Transportation Systems.

[41]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[42]  Meredith Ringel Morris,et al.  #TwitterSearch: a comparison of microblog search and web search , 2011, WSDM '11.

[43]  Timothy Baldwin,et al.  A Stacking-based Approach to Twitter User Geolocation Prediction , 2013, ACL.

[44]  Peter J. Diggle,et al.  SPLANCS: spatial point pattern analysis code in S-Plus , 1993 .

[45]  Shaowen Wang,et al.  GeoBurst: Real-Time Local Event Detection in Geo-Tagged Tweet Streams , 2016, SIGIR.

[46]  Sheila Kinsella,et al.  "I'm eating a sandwich in Glasgow": modeling locations with tweets , 2011, SMUC '11.

[47]  Xing Xie,et al.  Location-Based Social Networks: Locations , 2011, Computing with Spatial Trajectories.

[48]  Robert S. Boyer,et al.  MJRTY: A Fast Majority Vote Algorithm , 1991, Automated Reasoning: Essays in Honor of Woody Bledsoe.

[49]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[50]  C. Robusto The Cosine-Haversine Formula , 1957 .

[51]  Chao Huang,et al.  Theme-Relevant Truth Discovery on Twitter: An Estimation Theoretic Approach , 2016, ICWSM.

[52]  I. Thomas Spatial data aggregation: exploratory analysis of road accidents. , 1996, Accident; analysis and prevention.

[53]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[54]  Jason Baldridge,et al.  Supervised Text-based Geolocation Using Language Models on an Adaptive Grid , 2012, EMNLP.

[55]  Maximilian Walther,et al.  Geo-spatial Event Detection in the Twitter Stream , 2013, ECIR.

[56]  Víctor Soto,et al.  Characterizing Urban Landscapes Using Geolocated Tweets , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[57]  Kazufumi Watanabe,et al.  Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs , 2011, CIKM '11.

[58]  Feng Chen,et al.  From Twitter to detector: real-time traffic incident detection using social media data , 2016 .

[59]  Mark Dredze,et al.  Geolocation for Twitter: Timing Matters , 2016, NAACL.

[60]  Mor Naaman,et al.  On the Accuracy of Hyper-local Geotagging of Social Media Content , 2014, WSDM.

[61]  Mawloud Mosbah,et al.  Majority Voting Re-ranking Algorithm for Content Based-Image Retrieval , 2015, MTSR.