A graph based clustering technique for tweet summarization

Twitter is a very popular online social networking site, where hundreds of millions of tweets are posted every day by millions of users. Twitter is now considered as one of the fastest and most popular communication mediums, and is frequently used to keep track of recent events or news-stories. Whereas tweets related to a particular event / news-story can easily be found using keyword matching, many of the tweets are likely to contain semantically identical information. If a user wants to keep track of an event / news-story, it is difficult for him to have to read all the tweets containing identical or redundant information. Hence, it is desirable to have good techniques to summarize large number of tweets. In this work, we propose a graph-based approach for summarizing tweets, where a graph is first constructed considering the similarity among tweets, and community detection techniques are then used on the graph to cluster similar tweets. Finally, a representative tweet is chosen from each cluster to be included into the summary. The similarity among tweets is measured using various features including features based on WordNet synsets which help to capture the semantic similarity among tweets. The proposed approach achieves better performance than Sumbasic, an existing summarization technique.

[1]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[2]  Karen Spärck Jones Automatic summarising: The state of the art , 2007, Inf. Process. Manag..

[3]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[4]  Zihui Ge,et al.  Detecting and localizing end-to-end performance degradation for cellular data services , 2016, INFOCOM.

[5]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[6]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[7]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[8]  Alan F. Smeaton,et al.  Combining Social Network Analysis and Sentiment Analysis to Explore the Potential for Online Radicalisation , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.

[9]  Hercules Dalianis,et al.  SweNam-A Swedish Named Entity recognizer Its construction, training and evaluation , 2001 .

[10]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.

[11]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[12]  Martin Hassel Resource Lean and Portable Automatic Text Summarization , 2007 .

[13]  Wei Xu,et al.  A Preliminary Study of Tweet Summarization using Information Extraction , 2013 .

[14]  Udo Hahn,et al.  Text condensation as knowledge base abstraction , 1988, [1988] Proceedings. The Fourth Conference on Artificial Intelligence Applications.

[15]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[16]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[17]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[18]  Ralph Grishman,et al.  Summarization System Integrated with Named Entity Tagging and IE pattern Discovery , 2002, LREC.

[19]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[20]  Wai Lam,et al.  MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[21]  Martin Hassel Exploitation of Named Entities in Automatic Text Summarization for Swedish , 2003 .

[22]  Lisa F. Rau,et al.  Information extraction and text summarization using linguistic knowledge acquisition , 1989, Inf. Process. Manag..

[23]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[24]  Martin Hassel Summaries and the Process of Summarization from Evaluation of Automatic Text Summarization -a Practical Implementation , 2004 .