A Survey of Location Prediction on Twitter

Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we make a conclusion of the survey and list future research directions.

[1]  James Caverlee,et al.  Who is the barbecue king of texas?: a geo-spatial approach to finding local experts on twitter , 2014, SIGIR.

[2]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[3]  Timothy Baldwin,et al.  A Neural Model for User Geolocation and Lexical Dialectology , 2017, ACL.

[4]  Abeer El-Korany,et al.  Enabling Semantic User Context to Enhance Twitter Location Prediction , 2016, ICAART.

[5]  Ee-Peng Lim,et al.  Tweet Geolocation: Leveraging Location, User and Peer Signals , 2017, CIKM.

[6]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[7]  Aron Culotta,et al.  Inferring the origin locations of tweets with quantitative confidence , 2013, CSCW.

[8]  Dongwon Lee,et al.  @Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[9]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[10]  Chao Zhang,et al.  SERM: A Recurrent Model for Next Location Prediction in Semantic Trajectories , 2017, CIKM.

[11]  Allison Woodruff,et al.  GIPSY: Automated Geographic Indexing of Text Documents , 1994, J. Am. Soc. Inf. Sci..

[12]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[13]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[14]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[15]  Zhi Liu,et al.  SPOT: Locating Social Media Users Based on Social Network Context , 2014, Proc. VLDB Endow..

[16]  Kalina Bontcheva,et al.  Where's @wally?: a classification approach to geolocating users based on their social ties , 2013, HT.

[17]  Eiji Aramaki,et al.  Density Estimation for Geolocation via Convolutional Mixture Density Network , 2017, ArXiv.

[18]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[19]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[20]  James Caverlee,et al.  Location prediction in social media based on tie strength , 2013, CIKM.

[21]  Judith Gelernter,et al.  Cross-lingual geo-parsing for non-structured data , 2013, GIR '13.

[22]  Jon M. Kleinberg,et al.  Spatial variation in search engine queries , 2008, WWW.

[23]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[24]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[25]  J. Lafferty,et al.  A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval , 2001, SIGIR Forum.

[26]  Gisele L. Pappa,et al.  Inferring the Location of Twitter Messages Based on User Relationships , 2011, Trans. GIS.

[27]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[28]  Gao Cong,et al.  Annotating Points of Interest with Geo-tagged Tweets , 2016, CIKM.

[29]  Mudhakar Srivatsa,et al.  When twitter meets foursquare: tweet location prediction using foursquare , 2014, MobiQuitous.

[30]  Raphaël Troncy,et al.  Analysis of named entity recognition and linking for tweets , 2014, Inf. Process. Manag..

[31]  Jun Hu,et al.  Effective location identification from microblogs , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[32]  Philip S. Yu,et al.  Inferring crowd-sourced venues for tweets , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[33]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[34]  Cecilia Mascolo,et al.  Mining User Mobility Features for Next Place Prediction in Location-Based Services , 2012, 2012 IEEE 12th International Conference on Data Mining.

[35]  Bruno Martins,et al.  Automated Geocoding of Textual Documents: A Survey of Current Approaches , 2017, Trans. GIS.

[36]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[37]  Peng Zhang,et al.  Estimating the Locations of Emergency Events from Twitter Streams , 2014, ITQM.

[38]  Stephen Clark,et al.  Joint Word Segmentation and POS Tagging Using a Single Perceptron , 2008, ACL.

[39]  Derek Ruths,et al.  Geolocation Prediction in Twitter Using Social Networks: A Critical Analysis and Review of Current Practice , 2015, ICWSM.

[40]  Kyumin Lee,et al.  Exploring Millions of Footprints in Location Sharing Services , 2011, ICWSM.

[41]  Sanjoy Kumar Saha,et al.  Named Entity Recognition from Tweets , 2014, LWA.

[42]  Chong Wang,et al.  Mining geographic knowledge using location aware topic model , 2007, GIR '07.

[43]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[44]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[45]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[46]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[47]  Hanan Samet,et al.  Geotagging with local lexicons to build indexes for textually-specified spatial data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[48]  Steven Schockaert,et al.  Spatially Aware Term Selection for Geotagging , 2014, IEEE Transactions on Knowledge and Data Engineering.

[49]  Kai Zheng,et al.  Microblog Entity Linking with Social Temporal Context , 2015, SIGMOD Conference.

[50]  Huan Liu,et al.  Content-Aware Point of Interest Recommendation on Location-Based Social Networks , 2015, AAAI.

[51]  Timothy W. Finin,et al.  Why We Twitter: An Analysis of a Microblogging Community , 2009, WebKDD/SNA-KDD.

[52]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[53]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[54]  Yan Huang,et al.  Where are You Tweeting?: A Context and User Movement Based Approach , 2016, CIKM.

[55]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[56]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[57]  Gao Cong,et al.  Joint Recognition and Linking of Fine-Grained Locations from Tweets , 2016, WWW.

[58]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[59]  Ming-Wei Chang,et al.  Entity Linking on Microblogs with Spatial and Temporal Signals , 2014, TACL.

[60]  James Caverlee,et al.  A geographic study of tie strength in social media , 2011, CIKM '11.

[61]  Lars Backstrom,et al.  Find me if you can: improving geographical prediction with social and spatial proximity , 2010, WWW '10.

[62]  Ed H. Chi,et al.  Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles , 2011, CHI.

[63]  Zhu Wang,et al.  A sentiment-enhanced personalized location recommendation system , 2013, HT.

[64]  Timothy Baldwin,et al.  Geolocation Prediction in Social Media Data by Finding Location Indicative Words , 2012, COLING.

[65]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[66]  Cecilia Mascolo,et al.  The Length of Bridge Ties: Structural and Geographic Properties of Online Social Interactions , 2012, ICWSM.

[67]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[68]  Bu-Sung Lee,et al.  TwiNER: named entity recognition in targeted twitter stream , 2012, SIGIR '12.

[69]  Mans Hulden,et al.  Kernel Density Estimation for Text-Based Geolocation , 2015, AAAI.

[70]  Ross Purves,et al.  Twitter location (sometimes) matters: Exploring the relationship between georeferenced tweet content and nearby feature classes , 2014, J. Spatial Inf. Sci..

[71]  Jason Baldridge,et al.  Supervised Text-based Geolocation Using Language Models on an Adaptive Grid , 2012, EMNLP.

[72]  Mor Naaman,et al.  On the Accuracy of Hyper-local Geotagging of Social Media Content , 2014, WSDM.

[73]  Ming-Wei Chang,et al.  To Link or Not to Link? A Study on End-to-End Tweet Entity Linking , 2013, NAACL.

[74]  Mohamed F. Mokbel,et al.  Recommendations in location-based social networks: a survey , 2015, GeoInformatica.

[75]  Gisele L. Pappa,et al.  Exploring Multiple Evidences to Infer Users Location in Twitter , 2014 .

[76]  Xiaoming Zhang,et al.  From Interest to Function: Location Estimation in Social Media , 2013, AAAI.

[77]  Timothy Baldwin,et al.  Twitter User Geolocation Using a Unified Text and Network Prediction Model , 2015, ACL.

[78]  Satish V. Ukkusuri,et al.  Understanding urban human activity and mobility patterns using large-scale location-based data from online social media , 2013, UrbComp '13.

[79]  Tomoki Taniguchi,et al.  A Simple Scalable Neural Networks based Model for Geolocation Prediction in Twitter , 2016, NUT@COLING.

[80]  Timothy Baldwin,et al.  Exploiting Text and Network Context for Geolocation of Social Media Users , 2015, NAACL.

[81]  Michael R. Lyu,et al.  Where You Like to Go Next: Successive Point-of-Interest Recommendation , 2013, IJCAI.

[82]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[83]  Timothy Baldwin,et al.  A Stacking-based Approach to Twitter User Geolocation Prediction , 2013, ACL.

[84]  Huan Liu,et al.  Exploring Social-Historical Ties on Location-Based Social Networks , 2012, ICWSM.

[85]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[86]  Huan Liu,et al.  gSCorr: modeling geo-social correlations for new check-ins on location-based social networks , 2012, CIKM.

[87]  Jason Baldridge,et al.  Hierarchical Discriminative Classification for Text-Based Geolocation , 2014, EMNLP.

[88]  Mao Ye,et al.  Location recommendation for location-based social networks , 2010, GIS '10.

[89]  Cecilia Mascolo,et al.  Socio-Spatial Properties of Online Location-Based Social Networks , 2011, ICWSM.

[90]  Ravi Kumar,et al.  Object matching in tweets with spatial models , 2012, WSDM '12.

[91]  Huan Liu,et al.  Personalized location recommendation on location-based social networks , 2014, RecSys '14.

[92]  Carlo Ratti,et al.  Geo-located Twitter as proxy for global mobility patterns , 2013, Cartography and geographic information science.

[93]  Huiji Gao Personalized POI Recommendation on Location-Based Social Networks , 2014 .

[94]  Gao Cong,et al.  An Experimental Evaluation of Point-of-interest Recommendation in Location-based Social Networks , 2017, Proc. VLDB Endow..

[95]  H. T. Kung,et al.  Twitter Geolocation and Regional Classification via Sparse Coding , 2015, ICWSM.

[96]  Chenliang Li,et al.  Extracting fine‐grained location with temporal awareness in tweets: A two‐stage approach , 2017, J. Assoc. Inf. Sci. Technol..

[97]  Tomoki Taniguchi,et al.  Unifying Text, Metadata, and User Network Representations with a Neural Network for Geolocation Prediction , 2017, ACL.

[98]  Max Mühlhäuser,et al.  A Multi-Indicator Approach for Geolocalization of Tweets , 2013, ICWSM.

[99]  Shervin Malmasi,et al.  Location Mention Detection in Tweets and Microblogs , 2015, PACLING.

[100]  Ee-Peng Lim,et al.  Exploiting Contextual Information for Fine-Grained Tweet Geolocation , 2017, ICWSM.

[101]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[102]  Judith Gelernter,et al.  Geo‐parsing Messages from Microtext , 2011, Trans. GIS.

[103]  Henry A. Kautz,et al.  Finding your friends and following them to where you are , 2012, WSDM '12.

[104]  Kyumin Lee,et al.  A content-driven framework for geolocating microblog users , 2013, TIST.

[105]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[106]  Martha Larson,et al.  The where in the tweet , 2011, CIKM '11.

[107]  Weiru Liu,et al.  A survey of location inference techniques on Twitter , 2015, J. Inf. Sci..

[108]  Jeffrey Nichols,et al.  Home Location Identification of Twitter Users , 2014, TIST.

[109]  Timothy Baldwin,et al.  Automatic Identification of Locative Expressions from Social Media Text: A Comparative Analysis , 2014, LocWeb '14.

[110]  Hui Xiong,et al.  Learning geographical preferences for point-of-interest recommendation , 2013, KDD.

[111]  Rui Wang,et al.  Towards social user profiling: unified and discriminative influence model for inferring home locations , 2012, KDD.

[112]  Scott A. Hale,et al.  Where in the World Are You? Geolocation and Language Identification in Twitter* , 2013, ArXiv.

[113]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[114]  Hiroyuki Kitagawa,et al.  Online User Location Inference Exploiting Spatiotemporal Correlations in Social Streams , 2014, CIKM.

[115]  Timothy Baldwin,et al.  Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks , 2017, EMNLP.

[116]  Jie Yin,et al.  Location extraction from disaster-related microblogs , 2013, WWW.

[117]  Hiroyuki Kitagawa,et al.  Landmark-based user location inference in social media , 2013, COSN '13.

[118]  Mark Dredze,et al.  Geolocation for Twitter: Timing Matters , 2016, NAACL.

[119]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[120]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[121]  Rui Li,et al.  Multiple Location Profiling for Users and Relationships from Social Network and Content , 2012, Proc. VLDB Endow..

[122]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[123]  Marios D. Dikaiakos,et al.  Identification of key locations based on online social network activity , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[124]  Dan Wu,et al.  On assigning place names to geography related web pages , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[125]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[126]  Wei Shen,et al.  Linking named entities in Tweets with knowledge base via user interest modeling , 2013, KDD.

[127]  Barbara Plank,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , 2011 .

[128]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[129]  Michael R. Lyu,et al.  A Survey of Point-of-interest Recommendation in Location-based Social Networks , 2016, ArXiv.

[130]  Chenliang Li,et al.  Fine-grained location extraction from tweets with temporal awareness , 2014, SIGIR.

[131]  Jeffrey Nichols,et al.  Where Is This Tweet From? Inferring Home Locations of Twitter Users , 2012, ICWSM.

[132]  Sue Moon,et al.  Inferring Twitter user locations with 10 km accuracy , 2014, WWW.

[133]  Sarah Vieweg,et al.  Processing Social Media Messages in Mass Emergency , 2014, ACM Comput. Surv..

[134]  Kalina Bontcheva,et al.  Hyperlocal Home Location Identification of Twitter Profiles , 2017, HT.

[135]  Judith Gelernter,et al.  Geocoding location expressions in Twitter messages: A preference learning method , 2014, J. Spatial Inf. Sci..

[136]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[137]  Wen-Ning Kuo,et al.  Urban point-of-interest recommendation by mining user check-in behaviors , 2012, UrbComp '12.

[138]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[139]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[140]  Hui Xiong,et al.  Point-of-Interest Recommendation in Location Based Social Networks with Topic and Location Awareness , 2013, SDM.

[141]  Chandan K. Reddy,et al.  Location-specific tweet detection and topic summarization in Twitter , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[142]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[143]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[144]  Judith Gelernter,et al.  An algorithm for local geoparsing of microtext , 2013, GeoInformatica.

[145]  Xiao Zhang,et al.  SensePlace2: GeoTwitter analytics support for situational awareness , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[146]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[147]  Zhe Jiang,et al.  Spatial Statistics , 2013 .

[148]  Mark Dredze,et al.  Annotating Named Entities in Twitter Data with Crowdsourcing , 2010, Mturk@HLT-NAACL.

[149]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[150]  Ramanathan V. Guha,et al.  SemTag and seeker: bootstrapping the semantic web via automated semantic annotation , 2003, WWW '03.

[151]  David Allen,et al.  Geotagging one hundred million Twitter accounts with total variation minimization , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[152]  Sheila Kinsella,et al.  "I'm eating a sandwich in Glasgow": modeling locations with tweets , 2011, SMUC '11.

[153]  Nadia Magnenat-Thalmann,et al.  Who, where, when and what: discover spatio-temporal topics for twitter users , 2013, KDD.

[154]  David Jurgens,et al.  That's What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships , 2013, ICWSM.

[155]  Adam Jatowt,et al.  Portraying Collective Spatial Attention in Twitter , 2015, KDD.

[156]  Hongfei Lin,et al.  Where Are You Settling Down: Geo-locating Twitter Users Based on Tweets and Social Networks , 2012, AIRS.