Fine-grained Geolocation of Tweets in Temporal Proximity

In fine-grained tweet geolocation, tweets are linked to the specific venues (e.g., restaurants, shops) from which they were posted. This explicitly recovers the venue context that is essential for applications such as location-based advertising or user profiling. For this geolocation task, we focus on geolocating tweets that are contained in tweet sequences. In a tweet sequence, tweets are posted from some latent venue(s) by the same user and within a short time interval. This scenario arises from two observations: (1) It is quite common that users post multiple tweets in a short time and (2) most tweets are not geocoded. To more accurately geolocate a tweet, we propose a model that performs query expansion on the tweet (query) using two novel approaches. The first approach temporal query expansion considers users’ staying behavior around venues. The second approach visitation query expansion leverages on user revisiting the same or similar venues in the past. We combine both query expansion approaches via a novel fusion framework and overlay them on a Hidden Markov Model to account for sequential information. In our comprehensive experiments across multiple datasets and metrics, we show our proposed model to be more robust and accurate than other baselines.

[1]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Amit P. Sheth,et al.  Implicit Entity Linking in Tweets , 2016, ESWC.

[4]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[5]  Nadia Magnenat-Thalmann,et al.  Time-aware point-of-interest recommendation , 2013, SIGIR.

[6]  Sameer Patil,et al.  "Check out where I am!": location-sharing motivations, preferences, and practices , 2012, CHI Extended Abstracts.

[7]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[8]  Philip S. Yu,et al.  Inferring crowd-sourced venues for tweets , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[9]  Jason Baldridge,et al.  Supervised Text-based Geolocation Using Language Models on an Adaptive Grid , 2012, EMNLP.

[10]  Ee-Peng Lim,et al.  Attractiveness versus Competition: Towards an Unified Model for User Visitation , 2016, CIKM.

[11]  Mans Hulden,et al.  Kernel Density Estimation for Text-Based Geolocation , 2015, AAAI.

[12]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[13]  Virgílio A. F. Almeida,et al.  Beware of What You Share: Inferring Home Location in Social Networks , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[14]  Shinichi Nakajima,et al.  Global analytic solution of fully-observed variational Bayesian matrix factorization , 2013, J. Mach. Learn. Res..

[15]  H. Stanley,et al.  Lévy flight random searches in biological phenomena , 2002 .

[16]  Chenliang Li,et al.  Fine-grained location extraction from tweets with temporal awareness , 2014, SIGIR.

[17]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[18]  Heng Ji,et al.  Exploiting Geolocation, User and Temporal Information for Monitoring Natural Hazards on Twitter , 2015 .

[19]  David Jurgens,et al.  That's What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships , 2013, ICWSM.

[20]  Alexander J. Smola,et al.  Hierarchical geographical modeling of user locations from social media posts , 2013, WWW.

[21]  Vanessa Murdock,et al.  Modeling locations with social media , 2013, Information Retrieval.

[22]  Gao Cong,et al.  Joint Recognition and Linking of Fine-Grained Locations from Tweets , 2016, WWW.

[23]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[24]  Jun Hu,et al.  Effective location identification from microblogs , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[25]  Kotagiri Ramamohanarao,et al.  Inverted files versus signature files for text indexing , 1998, TODS.

[26]  Geoffrey I. Webb,et al.  Alleviating naive Bayes attribute independence assumption by attribute weighting , 2013, J. Mach. Learn. Res..

[27]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[28]  Yan Huang,et al.  Where are You Tweeting?: A Context and User Movement Based Approach , 2016, CIKM.

[29]  Cecilia Mascolo,et al.  Mining User Mobility Features for Next Place Prediction in Location-Based Services , 2012, 2012 IEEE 12th International Conference on Data Mining.

[30]  Cecilia Mascolo,et al.  An Empirical Study of Geographic User Activity Patterns in Foursquare , 2011, ICWSM.

[31]  Mudhakar Srivatsa,et al.  When twitter meets foursquare: tweet location prediction using foursquare , 2014, MobiQuitous.

[32]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[33]  ChengXiang Zhai,et al.  Positional relevance model for pseudo-relevance feedback , 2010, SIGIR.

[34]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[35]  Wei Shen,et al.  Linking named entities in Tweets with knowledge base via user interest modeling , 2013, KDD.

[36]  Timothy Baldwin,et al.  Twitter Geolocation Prediction Shared Task of the 2016 Workshop on Noisy User-generated Text , 2016, NUT@COLING.

[37]  Timothy Baldwin,et al.  Twitter User Geolocation Using a Unified Text and Network Prediction Model , 2015, ACL.

[38]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[39]  Virgílio A. F. Almeida,et al.  We know where you live: privacy characterization of foursquare behavior , 2012, UbiComp.

[40]  Nicholas Jing Yuan,et al.  Mining novelty-seeking trait across heterogeneous domains , 2014, WWW.

[41]  Arkaitz Zubiaga,et al.  Exploiting Geolocation, User and Temporal Information for Natural Hazards Monitoring in Twitter , 2015, Proces. del Leng. Natural.

[42]  W. Bruce Croft,et al.  Quary Expansion Using Local and Global Document Analysis , 1996, SIGIR Forum.

[43]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[44]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[45]  Ee-Peng Lim,et al.  Exploiting Contextual Information for Fine-Grained Tweet Geolocation , 2017, ICWSM.

[46]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[47]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[48]  Martha Larson,et al.  The where in the tweet , 2011, CIKM '11.

[49]  Dongwon Lee,et al.  @Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[50]  Prasenjit Majumder,et al.  Query Expansion for Microblog Retrieval , 2011, TREC.

[51]  Kyunghan Lee,et al.  On the Levy-Walk Nature of Human Mobility , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[52]  Ee-Peng Lim,et al.  Tweet Geolocation: Leveraging Location, User and Peer Signals , 2017, CIKM.

[53]  Aron Culotta,et al.  Inferring the origin locations of tweets with quantitative confidence , 2013, CSCW.

[54]  Jason I. Hong,et al.  Our House, in the Middle of Our Tweets , 2021, ICWSM.

[55]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[56]  Michiaki Tatsubori,et al.  Location inference using microblog messages , 2012, WWW.

[57]  Jing Jiang,et al.  Linking Entities to a Knowledge Base with Query Expansion , 2011, EMNLP.

[58]  Sheila Kinsella,et al.  "I'm eating a sandwich in Glasgow": modeling locations with tweets , 2011, SMUC '11.