Tracking geographical locations using a geo-aware topic model for analyzing social media data

Abstract Tracking how discussion topics evolve in social media and where these topics are discussed geographically over time has the potential to provide useful information for many different purposes. In crisis management, knowing a specific topic's current geographical location could provide vital information to where, or even which, resources should be allocated. This paper describes an attempt to track online discussions geographically over time. A distributed geo-aware streaming latent Dirichlet allocation model was developed for the purpose of recognizing topics' locations in unstructured text. To evaluate the model it has been implemented and used for automatic discovery and geographical tracking of election topics during parts of the 2016 American presidential primary elections. It was shown that the locations correlated with the actual election locations, and that the model provides a better geolocation classification compared to using a keyword-based approach.

[1]  Fredrik Johansson,et al.  Emotion classification of social media posts for estimating people’s reactions to communicated alert messages during crises , 2014, Security Informatics.

[2]  Frederick Reiss,et al.  Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks , 2010, EMNLP.

[3]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[4]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[5]  Rui Wang,et al.  Towards social user profiling: unified and discriminative influence model for inferring home locations , 2012, KDD.

[6]  Timothy Baldwin,et al.  On-line Trend Analysis with Topic Models: #twitter Trends Detection Topic Model Online , 2012, COLING.

[7]  Divyakant Agrawal,et al.  GeoScope: Online Detection of Geo-Correlated Information Trends in Social Networks , 2013, Proc. VLDB Endow..

[8]  Haiyan Wang,et al.  Regional Level Influenza Study with Geo-Tagged Twitter Data , 2016, Journal of Medical Systems.

[9]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[10]  David Yarowsky,et al.  Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence , 1999, EMNLP.

[11]  Balaraman Ravindran,et al.  Multi-view methods for protein structure comparison using latent dirichlet allocation , 2011, Bioinform..

[12]  Uffe Kock Wiil,et al.  Criminal network investigation , 2014, Security Informatics.

[13]  Heng Ji,et al.  A Novel Neural Topic Model and Its Supervised Extension , 2015, AAAI.

[14]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[15]  Ping Liu,et al.  Sentiment Classification Based on AS-LDA Model , 2014, ITQM.

[16]  Padhraic Smyth,et al.  Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model , 2006, NIPS.

[17]  Chong Wang,et al.  Mining geographic knowledge using location aware topic model , 2007, GIR '07.

[18]  Brian H. Spitzberg,et al.  Mapping social activities and concepts with social media (Twitter) and web search engines (Yahoo and Bing): a case study in 2012 US Presidential Election , 2013 .

[19]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[20]  Judith Gelernter,et al.  Geocoding location expressions in Twitter messages: A preference learning method , 2014, J. Spatial Inf. Sci..

[21]  Diyi Yang,et al.  Incorporating Word Correlation Knowledge into Topic Modeling , 2015, NAACL.

[22]  Alexander J. Smola,et al.  Hierarchical geographical modeling of user locations from social media posts , 2013, WWW.

[23]  Huan Wang,et al.  An Algorithm for Creating Prognostic Systems for Cancer , 2016, Journal of Medical Systems.

[24]  Daniel Barbará,et al.  On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[25]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[26]  Scott A. Hale,et al.  Where in the World Are You? Geolocation and Language Identification in Twitter* , 2013, ArXiv.

[27]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Long Zhu,et al.  A Hybrid Neural Network-Latent Topic Model , 2012, AISTATS.

[29]  W. Eric L. Grimson,et al.  Spatial Latent Dirichlet Allocation , 2007, NIPS.

[30]  Chong Wang,et al.  Online Variational Inference for the Hierarchical Dirichlet Process , 2011, AISTATS.

[31]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[32]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[33]  Aron Culotta,et al.  Inferring the origin locations of tweets with quantitative confidence , 2013, CSCW.

[34]  Henry A. Kautz,et al.  Finding your friends and following them to where you are , 2012, WSDM '12.

[35]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[36]  Max Mühlhäuser,et al.  A Multi-Indicator Approach for Geolocalization of Tweets , 2013, ICWSM.

[37]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[38]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[39]  Max Welling,et al.  Distributed Algorithms for Topic Models , 2009, J. Mach. Learn. Res..

[40]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[41]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[42]  Patrick Schone,et al.  Mining Wiki Resources for Multilingual Named Entity Recognition , 2008, ACL.