Population Bias in Geotagged Tweets

Geotagged tweets are an exciting and increasingly popular data source, but like all social media data, they potentially have biases in who are represented. Motivated by this, we investigate the question, ‘are users of geotagged tweets randomly distributed over the US population’? We link approximately 144 million geotagged tweets within the US, representing 2.6m unique users, to high-resolution Census population data and carry out a statistical test by which we answer this question strongly in the negative. We utilize spatial models and integrate further Census data to investigate the factors associated with this nonrandom distribution. We find that, controlling for other factors, population has no effect on the number of geotag users, and instead it is predicted by a number of factors including higher median income, being in an urban area, being further east or on a coast, having more young people, and having high Asian, Black or Hispanic/Latino populations.

[1]  Luc Anselin,et al.  Properties of Tests for Spatial Dependence in Linear Regression Models , 2010 .

[2]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[3]  Christopher M. Danforth,et al.  The Geography of Happiness: Connecting Twitter Sentiment and Expression, Demographics, and Objective Characteristics of Place , 2013, PloS one.

[4]  Yu-Ru Lin,et al.  The ripple of fear, sympathy and solidarity during the Boston bombings , 2014, EPJ Data Science.

[5]  Víctor Soto,et al.  Characterizing Urban Landscapes Using Geolocated Tweets , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[7]  Scott A. Hale,et al.  Where in the World Are You? Geolocation and Language Identification in Twitter* , 2013, ArXiv.

[8]  Kyumin Lee,et al.  Spatio-temporal dynamics of online memes: a study of geo-tagged tweets , 2013, WWW.

[9]  Roger Bivand,et al.  Computing the Jacobian in Gaussian Spatial Autoregressive Models: An Illustrated Comparison of Available Methods , 2013 .

[10]  Roger Bivand,et al.  Comparing Implementations of Estimation Methods for Spatial Econometrics , 2015 .

[11]  R. Guha,et al.  What are we ‘tweeting’ about obesity? Mapping tweets with topic modeling and Geographic Information System , 2013, Cartography and geographic information science.

[12]  Diederik W. van Liere,et al.  How far does a tweet travel?: Information brokers in the twitterverse , 2010, MSM '10.

[13]  Carlo Gaetan,et al.  Spatial Statistics and Modeling , 2009 .

[14]  Edzer J. Pebesma,et al.  Applied Spatial Data Analysis with R - Second Edition , 2008, Use R!.

[15]  Zeynep Tufekci,et al.  Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls , 2014, ICWSM.

[16]  Sheila Kinsella,et al.  "I'm eating a sandwich in Glasgow": modeling locations with tweets , 2011, SMUC '11.

[17]  Nadia Magnenat-Thalmann,et al.  Who, where, when and what: discover spatio-temporal topics for twitter users , 2013, KDD.

[18]  Tristan Gaugel,et al.  On Spatial Measures for Geotagged Social Media Contents , 2014 .

[19]  Barry Wellman,et al.  Geography of Twitter networks , 2012, Soc. Networks.

[20]  Huan Liu,et al.  A behavior analytics approach to identifying tweets from crisis regions , 2014, HT.

[21]  Sarah Florini Tweets, Tweeps, and Signifyin’ , 2014 .

[22]  Brent J. Hecht,et al.  A Tale of Cities: Urban Biases in Volunteered Geographic Information , 2014, ICWSM.

[23]  Sanjay Sharma Black Twitter?: Racial Hashtags, Networks and Contagion , 2013 .

[24]  Luc Anselin,et al.  Under the hood , 2002 .

[25]  J. Brownstein,et al.  A Case Study of the New York City 2012-2013 Influenza Season With Daily Geocoded Twitter Data From Temporal and Spatiotemporal Perspectives , 2014, Journal of medical Internet research.

[26]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[27]  Fred Morstatter,et al.  Finding Eyewitness Tweets During Crises , 2014, LTCSS@ACL.

[28]  John Davies,et al.  Real time road traffic monitoring alert based on incremental learning from tweets , 2014, 2014 IEEE Symposium on Evolving and Autonomous Learning Systems (EALS).

[29]  Sune Lehmann,et al.  Understanding the Demographics of Twitter Users , 2011, ICWSM.

[30]  José van Dijck Twitter and the Paradox of Following and Trending , 2013 .

[31]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[32]  Alan Mislove,et al.  The Tweets They Are a-Changin: Evolution of Twitter Users and Behavior , 2014, ICWSM.

[33]  Ed H. Chi,et al.  Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles , 2011, CHI.

[34]  Matthew Zook,et al.  Mapping the Data Shadows of Hurricane Sandy: Uncovering the Sociospatial Dimensions of ‘Big Data’ , 2014 .

[35]  D. Ruths,et al.  Social media for large studies of behavior , 2014, Science.

[36]  Meredith Clark,et al.  To tweet our own cause: A mixed-methods study of the online phenomenon "Black Twitter" , 2014 .

[37]  Huan Liu,et al.  When is it biased?: assessing the representativeness of twitter's streaming API , 2014, WWW.

[38]  Swapna S. Gokhale,et al.  Human sensing for smart cities , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[39]  Shaowen Wang,et al.  Mapping the global Twitter heartbeat: The geography of Twitter , 2013, First Monday.

[40]  Chao Chen,et al.  Detecting Non‐personal and Spam Users on Geo‐tagged Twitter Network , 2014, Trans. GIS.

[41]  Ate Poorthuis,et al.  Follow thy neighbor: Connecting the social and the spatial networks on Twitter , 2015, Comput. Environ. Urban Syst..

[42]  Luc Anselin,et al.  Using Exploratory Spatial Data Analysis to Leverage Social Indicator Databases: The Discovery of Interesting Patterns , 2007 .

[43]  Paul A. Longley,et al.  The Geotemporal Demographics of Twitter Usage , 2015 .

[44]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[45]  Matthew Zook,et al.  Beyond the geotag: situating ‘big data’ and leveraging the potential of the geoweb , 2013 .