Twitter Topic Modeling for Breaking News Detection

Social media platforms like Twitter have become increasingly popular for the dissemination and discussion of current events. Twitter makes it possible for people to share stories that they find interesting with their followers, and write updates on what is happening around them. In this paper we attempt to use topic models of tweets in real time to identify breaking news. Two different methods, Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Process (HDP) are tested with each tweet in the training corpus as a document by itself, as well as with all the tweets of a unique user regarded as one document. This second approach emulates Author-Topic modeling (AT-modeling). The evaluation of methods relies on manual scoring of the accuracy of the modeling by volunteered participants. The experiments indicate topic modeling on tweets in real-time is not suitable for detecting breaking news by itself, but may be useful in analyzing and describing news tweets.

[1]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[2]  Barbara Poblete,et al.  Twitter under crisis: can we trust what we RT? , 2010, SOMA '10.

[3]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[4]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[5]  Xun Wang,et al.  Real Time Event Detection in Twitter , 2013, WAIM.

[6]  Jon Espen Ingvaldsen,et al.  User Controlled News Recommendations , 2015, IntRS@RecSys.

[7]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[8]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[9]  Thomas L. Griffiths,et al.  Learning author-topic models from text corpora , 2010, TOIS.

[10]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.

[11]  Chong Wang,et al.  Online Variational Inference for the Hierarchical Dirichlet Process , 2011, AISTATS.

[12]  Kwan-Liu Ma,et al.  Breaking news on twitter , 2012, CHI.

[13]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[14]  Brett Meyer,et al.  TwitterReporter: Breaking News Detection and Visualization through the Geo-Tagged Twitter Network , 2011, CATA.

[15]  Jon Atle Gulla,et al.  Implicit User Profiling in News Recommender Systems , 2014, WEBIST.

[16]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[17]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.