Locality-adapted Kernel Densities for Tweet Localization

We propose a location prediction method for tweets based on the geographical probability distribution of their terms over a region. In our method, the probabilities are calculated using Kernel Density Estimation (KDE), where the bandwidth of the kernel function for each term is determined separately according to the location indicativeness of the term. Prediction for a new tweet is performed by combining the probability distributions of its terms weighted by their information gain ratio. The method we propose relies on statistical approaches without requiring any parameter tuning. Experiments conducted on three tweet sets from different regions of the world indicate significant improvement in prediction accuracy compared to the state-of-the-art methods.

[1]  Bruno Martins,et al.  Automated Geocoding of Textual Documents: A Survey of Current Approaches , 2017, Trans. GIS.

[2]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[3]  Aixin Sun,et al.  A Survey of Location Prediction on Twitter , 2017, IEEE Transactions on Knowledge and Data Engineering.

[4]  Padhraic Smyth,et al.  Modeling human location data with mixtures of kernel densities , 2014, KDD.

[5]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[6]  Mans Hulden,et al.  Kernel Density Estimation for Text-Based Geolocation , 2015, AAAI.

[7]  Chi-Yin Chow,et al.  iGSLR: personalized geo-social location recommendation: a kernel density estimation approach , 2013, SIGSPATIAL/GIS.

[8]  Tomoki Taniguchi,et al.  A Simple Scalable Neural Networks based Model for Geolocation Prediction in Twitter , 2016, NUT@COLING.

[9]  Geert-Jan Houben,et al.  Placing images on the world map: a microblog-based enrichment approach , 2012, SIGIR '12.

[10]  Dongwon Lee,et al.  @Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[11]  Heri Ramampiaro,et al.  Spatial Statistics of Term Co-occurrences for Location Prediction of Tweets , 2018, ECIR.

[12]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[13]  Kyumin Lee,et al.  Uncovering social spammers: social honeypots + machine learning , 2010, SIGIR.

[14]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[15]  Steven Schockaert,et al.  Spatially Aware Term Selection for Geotagging , 2014, IEEE Transactions on Knowledge and Data Engineering.

[16]  Aron Culotta,et al.  Inferring the origin locations of tweets with quantitative confidence , 2013, CSCW.