ET-LDA: Joint Topic Modeling for Aligning Events and their Twitter Feedback

During broadcast events such as the Superbowl, the U.S. Presidential and Primary debates, etc., Twitter has become the de facto platform for crowds to share perspectives and commentaries about them. Given an event and an associated large-scale collection of tweets, there are two fundamental research problems that have been receiving increasing attention in recent years. One is to extract the topics covered by the event and the tweets; the other is to segment the event. So far these problems have been viewed separately and studied in isolation. In this work, we argue that these problems are in fact inter-dependent and should be addressed together. We develop a joint Bayesian model that performs topic modeling and event segmentation in one unified framework. We evaluate the proposed model both quantitatively and qualitatively on two large-scale tweet datasets associated with two events from different domains to show that it improves significantly over baseline models.

[1]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[2]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[3]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[4]  Stanley Boykin,et al.  Machine learning of event segmentation for news on demand , 2000, CACM.

[5]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[6]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[7]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[9]  Thomas L. Griffiths,et al.  Unsupervised Topic Modelling for Multi-Party Spoken Discourse , 2006, ACL.

[10]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[11]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[12]  Nan Sun,et al.  Exploiting internal and external semantics for the clustering of short texts using world knowledge , 2009, CIKM.

[13]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[14]  Marti A. Hearst TextTiling: A Quantitative Approach to Discourse , 1993 .

[15]  Huan Liu,et al.  Text Analytics in Social Media , 2012, Mining Text Data.

[16]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[17]  Ajita John,et al.  Event analytics via social media , 2011, SBNMA '11.

[18]  Fei Wang,et al.  What Were the Tweets About? Topical Associations between Public Events and Twitter Feeds , 2012, ICWSM.

[19]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[20]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[21]  David A. Shamma,et al.  Tweet the debates: understanding community annotation of uncollected sources , 2009, WSM@MM.

[22]  Steve Renals,et al.  Dynamic Bayesian networks for meeting structuring , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.