Exploiting User and Venue Characteristics for Fine-Grained Tweet Geolocation

Which venue is a tweet posted from? We call this a fine-grained geolocation problem. Given an observed tweet, the task is to infer its discrete posting venue, e.g., a specific restaurant. This recovers the venue context and differs from prior work, which geolocats tweets to location coordinates or cities/neighborhoods. First, we conduct empirical analysis to uncover venue and user characteristics for improving geolocation. For venues, we observe spatial homophily, in which venues near each other have more similar tweet content (i.e., text representations) compared to venues further apart. For users, we observe that they are spatially focused and more likely to visit venues near their previous visits. We also find that a substantial proportion of users post one or more geocoded tweet(s), thus providing their location history data. We then propose geolocation models that exploit spatial homophily and spatial focus characteristics plus posting time information. Our models rank candidate venues of test tweets such that the actual posting venue is ranked high. To better tune model parameters, we introduce a learning-to-rank framework. Our best model significantly outperforms state-of-the-art baselines. Furthermore, we show that tweets without any location-indicative words can be geolocated meaningfully as well.

[1]  Vanessa Murdock,et al.  Modeling locations with social media , 2013, Information Retrieval.

[2]  Chenliang Li,et al.  Fine-grained location extraction from tweets with temporal awareness , 2014, SIGIR.

[3]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[4]  Xing Xie,et al.  Discovering regions of different functions in a city using human mobility and POIs , 2012, KDD.

[5]  Cecilia Mascolo,et al.  Mining User Mobility Features for Next Place Prediction in Location-Based Services , 2012, 2012 IEEE 12th International Conference on Data Mining.

[6]  Cecilia Mascolo,et al.  An Empirical Study of Geographic User Activity Patterns in Foursquare , 2011, ICWSM.

[7]  Alexander J. Smola,et al.  Hierarchical geographical modeling of user locations from social media posts , 2013, WWW.

[8]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[9]  Philip S. Yu,et al.  Inferring crowd-sourced venues for tweets , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[10]  Virgílio A. F. Almeida,et al.  Beware of What You Share: Inferring Home Location in Social Networks , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[11]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[12]  Gao Cong,et al.  Joint Recognition and Linking of Fine-Grained Locations from Tweets , 2016, WWW.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Kyunghan Lee,et al.  On the Levy-Walk Nature of Human Mobility , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[15]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[16]  Jason Baldridge,et al.  Supervised Text-based Geolocation Using Language Models on an Adaptive Grid , 2012, EMNLP.

[17]  Mans Hulden,et al.  Kernel Density Estimation for Text-Based Geolocation , 2015, AAAI.

[18]  Norman M. Sadeh,et al.  The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City , 2012, ICWSM.

[19]  James B. D. Joshi,et al.  Understanding venue popularity in Foursquare , 2013, 9th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[20]  Nicholas Jing Yuan,et al.  Mining novelty-seeking trait across heterogeneous domains , 2014, WWW.

[21]  Martha Larson,et al.  The where in the tweet , 2011, CIKM '11.

[22]  Dongwon Lee,et al.  @Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[23]  H. Stanley,et al.  Lévy flight random searches in biological phenomena , 2002 .

[24]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[25]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[26]  Jiawei Han,et al.  Geographical topic discovery and comparison , 2011, WWW.

[27]  Aron Culotta,et al.  Inferring the origin locations of tweets with quantitative confidence , 2013, CSCW.

[28]  Jason I. Hong,et al.  Our House, in the Middle of Our Tweets , 2021, ICWSM.

[29]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[30]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[31]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[32]  Timothy Baldwin,et al.  Twitter Geolocation Prediction Shared Task of the 2016 Workshop on Noisy User-generated Text , 2016, NUT@COLING.

[33]  Timothy Baldwin,et al.  Twitter User Geolocation Using a Unified Text and Network Prediction Model , 2015, ACL.

[34]  Aristides Gionis,et al.  Where Is the Soho of Rome? Measures and Algorithms for Finding Similar Neighborhoods in Cities , 2015, ICWSM.

[35]  Ee-Peng Lim,et al.  Exploiting Contextual Information for Fine-Grained Tweet Geolocation , 2017, ICWSM.

[36]  Sheila Kinsella,et al.  "I'm eating a sandwich in Glasgow": modeling locations with tweets , 2011, SMUC '11.

[37]  Mudhakar Srivatsa,et al.  When twitter meets foursquare: tweet location prediction using foursquare , 2014, MobiQuitous.

[38]  Ee-Peng Lim,et al.  Attractiveness versus Competition: Towards an Unified Model for User Visitation , 2016, CIKM.

[39]  Huiji Gao,et al.  Harnessing the Crowdsourcing Power of Social Media for Disaster Relief , 2011, IEEE Intelligent Systems.

[40]  Philip S. Yu,et al.  When and Where: Predicting Human Movements Based on Social Spatial-Temporal Events , 2014, SDM.

[41]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[42]  Nadia Magnenat-Thalmann,et al.  Time-aware point-of-interest recommendation , 2013, SIGIR.

[43]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[44]  Konstantina Christakopoulou,et al.  Collaborative Ranking with a Push at the Top , 2015, WWW.

[45]  Virgílio A. F. Almeida,et al.  We know where you live: privacy characterization of foursquare behavior , 2012, UbiComp.

[46]  Ee-Peng Lim,et al.  Prediction of Venues in Foursquare Using Flipped Topic Models , 2015, ECIR.

[47]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[48]  Michiaki Tatsubori,et al.  Location inference using microblog messages , 2012, WWW.

[49]  David Jurgens,et al.  That's What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships , 2013, ICWSM.

[50]  Padhraic Smyth,et al.  Modeling human location data with mixtures of kernel densities , 2014, KDD.