Comparing Twitter Summarization Algorithms for Multiple Post Summaries

Due to the sheer volume of text generated by a micro log site like Twitter, it is often difficult to fully understand what is being said about various topics. In an attempt to understand micro logs better, this paper compares algorithms for extractive summarization of micro log posts. We present two algorithms that produce summaries by selecting several posts from a given set. We evaluate the generated summaries by comparing them to both manually produced summaries and summaries produced by several leading traditional summarization systems. In order to shed light on the special nature of Twitter posts, we include extensive analysis of our results, some of which are unexpected.

[1]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[2]  Max Kaufmann Syntactic Normalization of Twitter Messages , 2010 .

[3]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[4]  G. Karypis,et al.  Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .

[5]  Jugal K. Kalita,et al.  Experiments in Microblog Summarization , 2010, 2010 IEEE Second International Conference on Social Computing.

[6]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[7]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[8]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[9]  Gordon I. McCalla,et al.  A Response to the Need for Summary Responses , 1984, COLING.

[10]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[11]  J. Kalita,et al.  Automatic Summarization of Twitter Topics , 2010 .

[12]  Jugal K. Kalita,et al.  Summarization as feature selection for text categorization , 2001, CIKM '01.

[13]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[14]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[15]  Wai Lam,et al.  MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[16]  Lin Dai,et al.  Subtopic-based multi-document summarization , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[17]  Gordon I. McCalla,et al.  Summarizing Natural Language Database Responses , 1986, Comput. Linguistics.

[18]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[19]  Jugal K. Kalita,et al.  Summarizing Microblogs Automatically , 2010, NAACL.

[20]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[21]  Dragomir R. Radev,et al.  Experiments in Single and Multi-Document Summarization Using MEAD , 2001 .

[22]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.