A Large-Scale Empirical Study of Geotagging Behavior on Twitter

Geotagging on social media has become an important proxy for understanding people's mobility and social events. Research that uses geotags to infer public opinions relies on several key assumptions about the behavior of geotagged and non-geotagged users. However, these assumptions have not been fully validated. Lack of understanding the geotagging behavior prohibits people further utilizing it. In this paper, we present an empirical study of geotagging behavior on Twitter based on more than 40 billion tweets collected from 20 million users. There are three main findings that may challenge these common assumptions. Firstly, different groups of users have different geotagging preferences. For example, less than 3% of users speaking in Korean are geotagged, while more than 40% of users speaking in Indonesian use geotags. Secondly, users who report their locations in profiles are more likely to use geotags, which may affects the generability of those location prediction systems on non-geotagged users. Thirdly, strong homophily effect exists in users' geotagging behavior, that users tend to connect to friends with similar geotagging preferences.

[1]  Anthony Stefanidis,et al.  #Earthquake: Twitter as a Distributed Sensor System , 2013, Trans. GIS.

[2]  Hyang-Sook Kim,et al.  What drives you to check in on Facebook? Motivations, privacy concerns, and mobile phone involvement for location-based information sharing , 2016, Comput. Hum. Behav..

[3]  Slava Kisilevich,et al.  Event-Based Analysis of People's Activities and Behavior Using Flickr and Panoramio Geotagged Photo Collections , 2010, 2010 14th International Conference Information Visualisation.

[4]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[5]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[6]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[7]  Timothy Baldwin,et al.  Geolocation Prediction in Social Media Data by Finding Location Indicative Words , 2012, COLING.

[8]  Cecilia Mascolo,et al.  An Empirical Study of Geographic User Activity Patterns in Foursquare , 2011, ICWSM.

[9]  M. Williams,et al.  Knowing the Tweeters: Deriving Sociologically Relevant Demographics from Twitter , 2013 .

[10]  Kathleen M. Carley,et al.  Parameterized Convolutional Neural Networks for Aspect Level Sentiment Classification , 2019, EMNLP.

[11]  Timothy Baldwin,et al.  Twitter User Geolocation Using a Unified Text and Network Prediction Model , 2015, ACL.

[12]  Jie Tang,et al.  A Probabilistic Framework for Location Inference from Social Media , 2017, ArXiv.

[13]  Guanling Chen,et al.  Analysis of a Location-Based Social Network , 2009, 2009 International Conference on Computational Science and Engineering.

[14]  Kathleen M. Carley,et al.  On Predicting Geolocation of Tweets Using Convolutional Neural Networks , 2017, SBP-BRiMS.

[15]  Jiajun Liu,et al.  Understanding Human Mobility from Twitter , 2014, PloS one.

[16]  Carlo Ratti,et al.  Geo-located Twitter as proxy for global mobility patterns , 2013, Cartography and geographic information science.

[17]  Loren G. Terveen,et al.  Capturing, sharing, and using local place information , 2007, CHI.

[18]  Wenwen Li,et al.  Using geolocated Twitter data to monitor the prevalence of healthy and unhealthy food references across the US , 2014 .

[19]  Kathleen M. Carley,et al.  Location Order Recovery in Trails with Low Temporal Resolution , 2019, IEEE Transactions on Network Science and Engineering.

[20]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[21]  Martin Ester,et al.  Spatial topic modeling in online social media for location recommendation , 2013, RecSys.

[22]  Luke S Sloan,et al.  Who Tweets with Their Location? Understanding the Relationship between Demographic Characteristics and the Use of Geoservices and Geotagging on Twitter , 2015, PloS one.

[23]  Mitsuo Yoshida,et al.  Decision Tree Analysis of Tourists' Preferences Regarding Tourist Attractions Using Geotag Data from Social Media , 2016, Urb-IoT.

[24]  Scott A. Hale,et al.  Where in the World Are You? Geolocation and Language Identification in Twitter* , 2013, ArXiv.

[25]  P. Biernacki,et al.  Snowball Sampling: Problems and Techniques of Chain Referral Sampling , 1981 .

[26]  Yifang Wei,et al.  Location-based Event Detection Using Geotagged Semantic Graphs , 2017 .

[27]  Kathleen M. Carley,et al.  Aspect Level Sentiment Classification with Attention-over-Attention Neural Networks , 2018, SBP-BRiMS.

[28]  Lars Backstrom,et al.  Find me if you can: improving geographical prediction with social and spatial proximity , 2010, WWW '10.

[29]  Yu Zhang,et al.  RATE: Overcoming Noise and Sparsity of Textual Features in Real-Time Location Estimation , 2017, CIKM.

[30]  Timothy Baldwin,et al.  A Stacking-based Approach to Twitter User Geolocation Prediction , 2013, ACL.

[31]  Noah E. Friedkin,et al.  Network Studies of Social Influence , 1993 .

[32]  Jason I. Hong,et al.  State of the Geotags: Motivations and Recent Changes , 2017, ICWSM.