Towards Holistic Summarization – Selecting Summaries, Not Sentences

In this paper we present a novel method for automatic text summarization through text extraction, using computational semantics. The new idea is to view all the extracted text as a whole and compute a score for the total impact of the summary, instead of ranking for instance individual sentences. A greedy search strategy is used to search through the space of possible summaries and select the summary with the highest score of those found. The aim has been to construct a summarizer that can be quickly assembled, with the use of only a very few basic language tools, for languages that lack large amounts of structured or annotated data or advanced tools for linguistic processing. The proposed method is largely language independent, though we only evaluate it on English in this paper, using ROUGEscores on texts from among others the DUC 2004 task 2. On this task our method performs better than several of the systems evaluated there, but worse than the best systems.

[1]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[2]  Gerald Salton,et al.  Automatic text processing , 1988 .

[3]  Magnus Sahlgren,et al.  Vector-based semantic analysis: representing word meanings based on random labels , 2001 .

[4]  Kathleen McKeown,et al.  Cut and Paste Based Text Summarization , 2000, ANLP.

[5]  Magnus Sahlgren,et al.  An Introduction to Random Indexing , 2005 .

[6]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[7]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[8]  Eduard Hovy,et al.  The Potential and Limitations of Automatic Sentence Extraction for Summarization , 2003, HLT-NAACL 2003.

[9]  K. R. Ramakrishnan,et al.  Multi-document Automatic Text Summarization Using Entropy Estimates , 2004, SOFSEM.

[10]  Robert L. Donaway,et al.  A Comparison of Rankings Produced by Summarization Evaluation Measures , 2000 .

[11]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[12]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[13]  Rickard Cöster,et al.  Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization , 2004, COLING.

[14]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[15]  Eduard Hovy,et al.  Manual and automatic evaluation of summaries , 2002, ACL 2002.

[16]  Hongyan Jing,et al.  Sentence Reduction for Automatic Text Summarization , 2000, ANLP.

[17]  Hsinchun Chen,et al.  Using sentence-selection heuristics to rank text segments in TXTRACTOR , 2002, JCDL '02.