Spatio-Temporal Event Detection from Multiple Data Sources

The proliferation of Internet-enabled smartphones has ushered in an era where events are reported on social media websites such as Twitter and Facebook. However, the short text nature of social media posts, combined with a large volume of noise present in such datasets makes event detection challenging. This problem can be alleviated by using other sources of information, such as news articles, that employ a precise and factual vocabulary, and are more descriptive in nature. In this paper, we propose Spatio-Temporal Event Detection (STED), a probabilistic model to discover events, their associated topics, time of occurrence, and the geospatial distribution from multiple data sources, such as news and Twitter. The joint modeling of news and Twitter enables our model to distinguish events from other noisy topics present in Twitter data. Furthermore, the presence of geocoordinates and timestamps in tweets helps find the spatio-temporal distribution of the events. We evaluate our model on a large corpus of Twitter and news data, and our experimental results show that STED can effectively discover events, and outperforms state-of-the-art techniques.

[1]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[2]  James Allan,et al.  Text classification and named entities for new event detection , 2004, SIGIR '04.

[3]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Sergej Sizov,et al.  GeoFolk: latent spatial semantics in web 2.0 social media , 2010, WSDM '10.

[6]  Sandor Laki,et al.  On a keyword-lifecycle model for real-time event detection in social network data , 2013, 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom).

[7]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[8]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[9]  Kathleen M. Carley,et al.  Microblog Sentiment Topic Model , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[10]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[11]  Julio Gonzalo,et al.  Towards real-time summarization of scheduled events from twitter streams , 2012, HT '12.

[12]  Wei Lu,et al.  A Probabilistic Geographical Aspect-Opinion Model for Geo-Tagged Microblogs , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[13]  Regina Barzilay,et al.  Event Discovery in Social Media Feeds , 2011, ACL.

[14]  Padhraic Smyth,et al.  Analyzing Entities and Topics in News Articles Using Statistical Topic Models , 2006, ISI.

[15]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[16]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[17]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[18]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[19]  Wei Lo,et al.  A Bayesian Graphical Model to Discover Latent Events from Twitter , 2015, ICWSM.

[20]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.