Automatic Summarization of Events from Social Media

Social media services such as Twitter generate phenomenal volume of content for most real-world events on a daily basis. Digging through the noise and redundancy to understand the important aspects of the content is a very challenging task. We propose a search and summarization framework to extract relevant representative tweets from a time-ordered sample of tweets to generate a coherent and concise summary of an event. We introduce two topic models that take advantage of temporal correlation in the data to extract relevant tweets for summarization. The summarization framework has been evaluated using Twitter data on four real-world events. Evaluations are performed using Wikipedia articles on the events as well as using Amazon Mechanical Turk (MTurk) with human readers (MTurkers). Both experiments show that the proposed models outperform traditional LDA and lead to informative summaries.

[1]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[2]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[3]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[4]  Sanda M. Harabagiu,et al.  Topic themes for multi-document summarization , 2005, SIGIR '05.

[5]  Ani Nenkova,et al.  A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization , 2006, SIGIR.

[6]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[7]  Daniel Marcu,et al.  Bayesian Query-Focused Summarization , 2006, ACL.

[8]  Dianne P. O'Leary,et al.  Topic-Focused Multi-Document Summarization Using an Approximate Oracle Score , 2006, ACL.

[9]  Ani Nenkova,et al.  Measuring Importance and Query Relevance in Topic-focused Multi-document Summarization , 2007, ACL.

[10]  Yihong Gong,et al.  Multi-Document Summarization using Sentence-based Topic Models , 2009, ACL.

[11]  Matthew Hurst,et al.  Event Detection and Tracking in Social Streams , 2009, ICWSM.

[12]  Yue Lu,et al.  Rated aspect summarization of short comments , 2009, WWW '09.

[13]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[14]  Ana-Maria Popescu,et al.  Detecting controversial events from twitter , 2010, CIKM.

[15]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[16]  Dilek Z. Hakkani-Tür,et al.  A Hybrid Hierarchical Model for Multi-Document Summarization , 2010, ACL.

[17]  B. Walker Deepwater horizon oil spill. , 2010, Journal of environmental health.

[18]  D. Blumenthal,et al.  Patient Protection and Affordable Care Act , 2010 .

[19]  Jugal K. Kalita,et al.  Summarizing Microblogs Automatically , 2010, NAACL.

[20]  Deepayan Chakrabarti,et al.  Event Summarization Using Tweets , 2011, ICWSM.

[21]  Alexander J. Smola,et al.  Scalable distributed inference of dynamic user interests for behavioral targeting , 2011, KDD.

[22]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[23]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[24]  Kazufumi Watanabe,et al.  Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs , 2011, CIKM '11.

[25]  Houfeng Wang,et al.  Entity-centric topic-oriented opinion summarization in twitter , 2012, KDD.

[26]  Ee-Peng Lim,et al.  Community-based classification of noun phrases in twitter , 2012, CIKM '12.

[27]  Srinivasan Parthasarathy,et al.  A framework for summarizing and analyzing twitter feeds , 2012, KDD.

[28]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[29]  Charu C. Aggarwal,et al.  Mining Text Data , 2012 .

[30]  ChengXiang Zhai,et al.  Micropinion generation: an unsupervised approach to generating ultra-concise summaries of opinions , 2012, WWW.

[31]  Eugene Agichtein,et al.  TM-LDA: efficient online modeling of latent topic transitions in social media , 2012, KDD.

[32]  Vikas Sindhwani,et al.  Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization , 2012, WSDM '12.

[33]  Eduard H. Hovy,et al.  Structured Event Retrieval over Microblog Archives , 2012, NAACL.

[34]  Shoshannah A. Pearlman The Patient Protection and Affordable Care Act , 2013, Journal of the American Psychiatric Nurses Association.