A multilayer recognition model for twitter user geolocation

AbstractGeolocation is important for many emerging applications such as disaster management and recommendation system. In this paper, we propose a multilayer recognition model (MRM) to predict the city-level location for social network users, solely based on the user’s tweet content. Through a series of optimizations such as entity selection, spatial clustering and outlier filtering, suitable features are extracted to model the geographic coordinates of tweet users. Then, the Multinomial Naive Bayes is applied to classify the datasets into different groups. The model is evaluated by comparing with an existing algorithm on twitter datasets. The experimental results reveal that our method achieves a better prediction accuracy of 54.82% on the test set, and the average error is reduced to 400.97 miles at best.

[1]  ChengXiang Zhai,et al.  Text-based geolocation prediction of social media users with neural networks , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[2]  Fang Chen,et al.  Twitter user geolocation by filtering of highly mentioned users , 2018, J. Assoc. Inf. Sci. Technol..

[3]  Fahad Bin Muhaya,et al.  Estimating Twitter User Location Using Social Interactions--A Content Based Approach , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[4]  David Jurgens,et al.  That's What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships , 2013, ICWSM.

[5]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[6]  Pavel Serdyukov,et al.  Placing flickr photos on a map , 2009, SIGIR.

[7]  Aixin Sun,et al.  Topic Modeling for Short Texts with Auxiliary Word Embeddings , 2016, SIGIR.

[8]  Timothy Baldwin,et al.  Twitter User Geolocation Using a Unified Text and Network Prediction Model , 2015, ACL.

[9]  Weiru Liu,et al.  A survey of location inference techniques on Twitter , 2015, J. Inf. Sci..

[10]  Timothy Baldwin,et al.  Exploiting Text and Network Context for Geolocation of Social Media Users , 2015, NAACL.

[11]  Stuart E. Middleton,et al.  Real-Time Crisis Mapping of Natural Disasters Using Social Media , 2014, IEEE Intelligent Systems.

[12]  Dilip B. Kotak,et al.  GRIDBSCAN: GRId Density-Based Spatial Clustering of Applications with Noise , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[13]  Dongwon Lee,et al.  @Phillies Tweeting from Philly? Predicting Twitter User Locations with Spatial Word Usage , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[14]  Yong Zhou,et al.  A Node Influence Based Label Propagation Algorithm for Community Detection in Networks , 2014, TheScientificWorldJournal.

[15]  Alyson G. Wilson,et al.  Twitter Geolocation , 2018, ACM Trans. Knowl. Discov. Data.

[16]  Philip S. Yu,et al.  Collective Geographical Embedding for Geolocating Social Network Users , 2017, PAKDD.

[17]  William Nick Street,et al.  Finding Hierarchical Communities in Complex Networks Using Influence-Guided Label Propagation , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[18]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[19]  Kuru Ratnavelu,et al.  Detecting Community Structure by Using a Constrained Label Propagation Algorithm , 2016, PloS one.

[20]  Hao Wu,et al.  Generating Realistic Synthetic Population Datasets , 2016, ACM Trans. Knowl. Discov. Data.