A Google Trends spatial clustering approach for a worldwide Twitter user geolocation

Abstract User location data is valuable for diverse social media analytics. In this paper, we address the non-trivial task of estimating a worldwide city-level Twitter user location considering only historical tweets. We propose a purely unsupervised approach that is based on a synthetic geographic sampling of Google Trends (GT) city-level frequencies of tweet nouns and three clustering algorithms. The approach was validated empirically by using a recently collected dataset, with 3,268 worldwide city-level locations of Twitter users, obtaining competitive results when compared with a state-of-the-art Word Distribution (WD) user location estimation method. The best overall results were achieved by the GT noun DBSCAN (GTN-DB) method, which is computationally fast, and correctly predicts the ground truth locations of 15%, 23%, 39% and 58% of the users for tolerance distances of 250 km, 500 km, 1,000 km and 2,000 km.

[1]  Jeffrey G. Gray,et al.  Improving geolocation of social media posts , 2017, Pervasive Mob. Comput..

[2]  Michael J. Paul,et al.  Carmen: A Twitter Geolocation System with Applications to Public Health , 2013 .

[3]  Diana Inkpen,et al.  Estimating User Location in Social Media with Stacked Denoising Auto-encoders , 2015, VS@HLT-NAACL.

[4]  Alyson G. Wilson,et al.  Twitter Geolocation , 2018, ACM Trans. Knowl. Discov. Data.

[5]  Kathleen M. Carley,et al.  A Hierarchical Location Prediction Neural Network for Twitter User Geolocation , 2019, EMNLP.

[6]  Ling Chen,et al.  A content-location-aware public welfare activity information push system based on microblog , 2020, Inf. Process. Manag..

[7]  Maribel Yasmina Santos,et al.  Concave hull: A k-nearest neighbours approach for the computation of the region occupied by a set of points , 2007, GRAPP.

[8]  Ming Zhou,et al.  Two-stage NER for tweets with clustering , 2013, Inf. Process. Manag..

[9]  Josiane Mothe,et al.  Location extraction from tweets , 2018, Inf. Process. Manag..

[10]  Abbas Rajabifard,et al.  A Multi-Element Approach to Location Inference of Twitter: A Case for Emergency Response , 2016, ISPRS Int. J. Geo Inf..

[11]  Hadi Tabatabaee Malazi,et al.  Evidential fine-grained event localization using Twitter , 2019, Inf. Process. Manag..

[12]  Paulo Cortez,et al.  Twitter user geolocation using web country noun searches , 2019, Decis. Support Syst..

[13]  Yeran Sun,et al.  On fine-grained geolocalisation of tweets and real-time traffic incident detection , 2019, Inf. Process. Manag..

[14]  Zaher Al Aghbari,et al.  SNSJam: Road traffic analysis and prediction by fusing data from multiple social networks , 2020, Inf. Process. Manag..

[15]  Antonio Jimeno-Yepes,et al.  Temporal Modelling of Geospatial Words in Twitter , 2016, ALTA.

[16]  Jon Crowcroft,et al.  Classification of Twitter Accounts into Automated Agents and Human Users , 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[17]  Max Mühlhäuser,et al.  A Multi-Indicator Approach for Geolocalization of Tweets , 2013, ICWSM.

[18]  Dimitrios Gunopulos,et al.  Home is where your friends are: Utilizing the social graph to locate twitter users in a city , 2016, Inf. Syst..

[19]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[20]  Jinbo Bi,et al.  Regression Error Characteristic Curves , 2003, ICML.

[21]  Maurizio Tesconi,et al.  GSP (Geo-Semantic-Parsing): Geoparsing and Geotagging with Machine Learning on Top of Linked Data , 2018, ESWC.

[22]  Lars Backstrom,et al.  Find me if you can: improving geographical prediction with social and spatial proximity , 2010, WWW '10.

[23]  Aron Culotta,et al.  Inferring the origin locations of tweets with quantitative confidence , 2013, CSCW.

[24]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[25]  Aixin Sun,et al.  A Survey of Location Prediction on Twitter , 2017, IEEE Transactions on Knowledge and Data Engineering.

[26]  Jon M. Kleinberg,et al.  Spatial variation in search engine queries , 2008, WWW.

[27]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[28]  Edward A. Fox,et al.  Read between the lines: A Machine Learning Approach for Disambiguating the Geo-location of Tweets , 2015, JCDL.

[29]  Seung-Pyo Jun,et al.  Ten years of research change using Google Trends: From the perspective of big data utilizations and applications , 2017 .

[30]  Muhammad Imran,et al.  Automatic identification of eyewitness messages on twitter during disasters , 2020, Inf. Process. Manag..

[31]  Kwan Hui Lim,et al.  Geolocation Prediction in Twitter Using Location Indicative Words and Textual Features , 2016, NUT@COLING.

[32]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[33]  Mark Dredze,et al.  Geolocation for Twitter: Timing Matters , 2016, NAACL.

[34]  Timothy Baldwin,et al.  Twitter User Geolocation Using a Unified Text and Network Prediction Model , 2015, ACL.

[35]  Halit Oguztüzün,et al.  Evidential estimation of event locations in microblogs using the Dempster-Shafer theory , 2016, Inf. Process. Manag..

[36]  Paulo Cortez,et al.  Stock market sentiment lexicon acquisition using microblogging data and statistical measures , 2016, Decis. Support Syst..

[37]  Duc Minh Nguyen,et al.  Twitter User Geolocation Using Deep Multiview Learning , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[39]  Jason Baldridge,et al.  Supervised Text-based Geolocation Using Language Models on an Adaptive Grid , 2012, EMNLP.

[40]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[41]  Arkaitz Zubiaga,et al.  Towards Real-Time, Country-Level Location Classification of Worldwide Tweets , 2016, IEEE Transactions on Knowledge and Data Engineering.

[42]  Heri Ramampiaro,et al.  Locality-adapted kernel densities of term co-occurrences for location prediction of tweets , 2019, Inf. Process. Manag..

[43]  Virgílio A. F. Almeida,et al.  Beware of What You Share: Inferring Home Location in Social Networks , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[44]  Mete Celik,et al.  Discovering socially similar users in social media datasets based on their socially important locations , 2018, Inf. Process. Manag..

[45]  Paola Zola,et al.  Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers , 2020, Neural Computing and Applications.

[46]  Bernard J. Jansen,et al.  What We Read, What We Search: Media Attention and Public Attention Among 193 Countries , 2018, WWW.

[47]  Charu C. Aggarwal,et al.  Data Clustering , 2013 .

[48]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[49]  Sue Moon,et al.  Inferring Twitter user locations with 10 km accuracy , 2014, WWW.

[50]  Ana Paula Couto da Silva,et al.  Fine-grained tourism prediction: Impact of social and environmental features , 2020, Inf. Process. Manag..

[51]  Tomoki Taniguchi,et al.  A Simple Scalable Neural Networks based Model for Geolocation Prediction in Twitter , 2016, NUT@COLING.