ET-LDA: Joint Topic Modeling For Aligning, Analyzing and Sensemaking of Public Events and Their Twitter Feeds

Social media channels such as Twitter have emerged as popular platforms for crowds to respond to public events such as speeches, sports and debates. While this promises tremendous opportunities to understand and make sense of the reception of an event from the social media, the promises come entwined with significant technical challenges. In particular, given an event and an associated large scale collection of tweets, we need approaches to effectively align tweets and the parts of the event they refer to. This in turn raises questions about how to segment the event into smaller yet meaningful parts, and how to figure out whether a tweet is a general one about the entire event or specific one aimed at a particular segment of the event. In this work, we present ET-LDA, an effective method for aligning an event and its tweets through joint statistical modeling of topical influences from the events and their associated tweets. The model enables the automatic segmentation of the events and the characterization of tweets into two categories: (1) episodic tweets that respond specifically to the content in the segments of the events, and (2) steady tweets that respond generally about the events. We present an efficient inference method for this model, and a comprehensive evaluation of its effectiveness over existing methods. In particular, through a user study, we demonstrate that users find the topics, the segments, the alignment, and the episodic tweets discovered by ET-LDA to be of higher quality and more interesting as compared to the state-of-the-art, with improvements in the range of 18-41%.

[1]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[2]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[3]  Duncan J. Watts,et al.  Everyone's an influencer: quantifying influence on twitter , 2011, WSDM '11.

[4]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[5]  Deepayan Chakrabarti,et al.  Event Summarization Using Tweets , 2011, ICWSM.

[6]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[7]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[8]  Leysia Palen,et al.  Chatter on the red: what hazards threat reveals about the social life of microblogged information , 2010, CSCW '10.

[9]  Julia Hirschberg,et al.  Automatic summarization of broadcast news using structural features , 2003, INTERSPEECH.

[10]  Marti A. Hearst TextTiling: A Quantitative Approach to Discourse , 1993 .

[11]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[14]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[15]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[16]  Susan T. Dumais,et al.  Characterizing Microblogs with Topic Models , 2010, ICWSM.

[17]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[18]  David A. Shamma,et al.  Tweet the debates: understanding community annotation of uncollected sources , 2009, WSM@MM.

[19]  Steve Renals,et al.  Dynamic Bayesian networks for meeting structuring , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[21]  David A. Shamma,et al.  Conversational Shadows: Describing Live Media Events Using Short Messages , 2010, ICWSM.

[22]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[23]  Mor Naaman,et al.  Unfolding the event landscape on twitter: classification and exploration of user categories , 2012, CSCW '12.

[24]  Leysia Palen,et al.  Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.

[25]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Thomas L. Griffiths,et al.  Unsupervised Topic Modelling for Multi-Party Spoken Discourse , 2006, ACL.

[27]  Duncan J. Watts,et al.  Who says what to whom on twitter , 2011, WWW.

[28]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[29]  Rizal Setya Perdana What is Twitter , 2013 .

[30]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[31]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[32]  Marti A. Hearst Text tiling: A quantitative approach to discourse segmentation , 1993, ACL 1993.

[33]  David A. Shamma,et al.  Characterizing debate performance via aggregated twitter sentiment , 2010, CHI.

[34]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[35]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[36]  Fei Wang,et al.  ET-LDA: Joint Topic Modeling for Aligning Events and their Twitter Feedback , 2012, AAAI.

[37]  Mor Naaman,et al.  Is it really about me?: message content in social awareness streams , 2010, CSCW '10.

[38]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[39]  Lifeng Sun,et al.  Who should share what?: item-level social influence prediction for users and posts ranking , 2011, SIGIR.

[40]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[41]  Ajita John,et al.  Event analytics via social media , 2011, SBNMA '11.

[42]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[43]  Mary Beth Rosson,et al.  How and why people Twitter: the role that micro-blogging plays in informal communication at work , 2009, GROUP.

[44]  Fei Wang,et al.  What Were the Tweets About? Topical Associations between Public Events and Twitter Feeds , 2012, ICWSM.

[45]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[46]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[47]  Nan Sun,et al.  Exploiting internal and external semantics for the clustering of short texts using world knowledge , 2009, CIKM.