Uncovering the Location of Twitter Users

Social networks, like Twitter and Facebook, are valuable sources to monitor real-time events, such as earthquakes and epidemics. For this type of surveillance the user's location is an essential piece of information, but a substantial number of users choose not to disclose their geographical information. However, characteristics of the users' behavior, such as the friends they associate with and the types of messages published may hint on their spatial location. In this paper, we present a method to infer the spatial location of Twitter users. Unlike the approaches proposed so far, we incorporate two sources of information to learn the geographical position: the text posted by users and their friendship network. We propose a probabilistic approach that jointly models the geographical labels and the Twitter texts of the users organized in the form of a graph representing the friendship network. We use the Markov random field probability model to represent the network and learning is carried out through a Markov chain Monte Carlo simulation technique to approximate the posterior probability distribution of the missing geographical labels. We demonstrate the utility of this model in a large dataset of Twitter users, where the ground truth is the location given by the GPS position, GeoIP location or declared location. The method is evaluated and compared to two baseline algorithms that employ either of these two types of information. The accuracy rates achieved are significantly better than those of the baseline methods.

[1]  Foster J. Provost,et al.  Classification in Networked Data: a Toolkit and a Univariate Case Study , 2007, J. Mach. Learn. Res..

[2]  Gisele L. Pappa,et al.  Inferring the Location of Twitter Messages Based on User Relationships , 2011, Trans. GIS.

[3]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[4]  Carmen Guerrero,et al.  Where are my followers? Understanding the Locality Effect in Twitter , 2011, ArXiv.

[5]  Virgílio A. F. Almeida,et al.  Dengue surveillance based on a computational model of spatio-temporal locality of Twitter , 2011, WebSci '11.

[6]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[7]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[8]  W. Marsden I and J , 2012 .

[9]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[10]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[11]  Dan Cosley,et al.  Inferring social ties from geographic coincidences , 2010, Proceedings of the National Academy of Sciences.

[12]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[13]  William W. Cohen,et al.  Semi-Supervised Classification of Network Data Using Very Few Labels , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[14]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[15]  Jeffrey Nichols,et al.  Where Is This Tweet From? Inferring Home Locations of Twitter Users , 2012, ICWSM.

[16]  Stan Z. Li Markov Random Field Modeling in Image Analysis , 2009, Advances in Pattern Recognition.