Location Inference of Social Media Posts at Hyper-Local Scale

This paper describes an approach to infer the location of a social media post at a hyper-local scale based on its content, conditional to the knowledge that the post originates from a larger area such as a city or even a state. The approach comprises three components: (i) a discriminative classifier, namely, Logistic Regression (LR) which selects from a set of most probable sub-regions from where a post might have originated, (ii) a clustering technique, namely, k-means, that adaptively partitions the larger geographic region into sub regions based on the density of the posts, and (iii) a range of techniques to extract a set of hyper-local words from the posts to be fed as features to the LR classifier. The approach is evaluated on a large corpus of tweets collected from Twitter over the NYC, Washington DC, and state of Connecticut regions. The results show that our approach can geo-locate tweets within 1:72 km for NYC, 12:5 km for DC and 37:00 km for CT. These results from three geographically and socially diverse regions suggest that our approach outperforms contemporary methods that estimate locations within ranges of hundreds of kilometers. It can thus support a wide array of services such as location-based advertising, and disaster and emergency response.

[1]  Henry A. Kautz,et al.  Modeling Spread of Disease from Social Interactions , 2012, ICWSM.

[2]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[3]  Mor Naaman,et al.  CityBeat: real-time social media visualization of hyper-local city data , 2014, WWW.

[4]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[5]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[6]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[7]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[8]  Max Mühlhäuser,et al.  A Multi-Indicator Approach for Geolocalization of Tweets , 2013, ICWSM.

[9]  Henry A. Kautz,et al.  Finding your friends and following them to where you are , 2012, WSDM '12.

[10]  Jeffrey Nichols,et al.  Home Location Identification of Twitter Users , 2014, TIST.

[11]  Jeffrey Nichols,et al.  Where Is This Tweet From? Inferring Home Locations of Twitter Users , 2012, ICWSM.

[12]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[13]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[14]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[15]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[16]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[17]  Swapna S. Gokhale,et al.  Discovering Perceptions in Online Social Media: A Probabilistic Approach , 2014, Int. J. Softw. Eng. Knowl. Eng..

[18]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[19]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[20]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[21]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[22]  Mor Naaman,et al.  On the Accuracy of Hyper-local Geotagging of Social Media Content , 2014, WSDM.