SUMMAC: a text summarization evaluation

The TIPSTER Text Summarization Evaluation (SUMMAC) has developed several new extrinsic and intrinsic methods for evaluating summaries. It has established definitively that automatic text summarization is very effective in relevance assessment tasks on news articles. Summaries as short as 17% of full text length sped up decision-making by almost a factor of 2 with no statistically significant degradation in accuracy. Analysis of feedback forms filled in after each decision indicated that the intelligibility of present-day machine-generated summaries is high. Systems that performed most accurately in the production of indicative and informative topic-related summaries used term frequency and co-occurrence statistics, and vocabulary overlap comparisons between text passages. However, in the absence of a topic, these statistical methods do not appear to provide any additional leverage: in the case of generic summaries, the systems were indistinguishable in accuracy. The paper discusses some of the tradeoffs and challenges faced by the evaluation, and also lists some of the lessons learned, impacts, and possible future directions. The evaluation methods used in the SUMMAC evaluation are of interest to both summarization evaluation as well as evaluation of other ‘output-related’ NLP technologies, where there may be many potentially acceptable outputs, with no automatic way to compare them.

[1]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[2]  Inderjeet Mani,et al.  Machine Learning of Generic and User-Focused Summarization , 1998, AAAI/IAAI.

[3]  Marc Moens,et al.  Argumentative Classification of Extracted Sentences as a First Step Towards Flexible Abstracting , 1999 .

[4]  Kathleen R. McKeown,et al.  Summarization Evaluation Methods: Experiments and Analysis , 1998 .

[5]  Daniel Marcu,et al.  The automatic construction of large-scale corpora for summarization research , 1999, SIGIR '99.

[6]  Inderjeet Mani Multi-document summarization , 2001 .

[7]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[8]  Jean-Luc Minel,et al.  How to Appreciate the Quality of Automatic Text Summarization? Examples of FAN and MLUCE Protocols and their Results on SERAPHIN , 1997, ACL 1997.

[9]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[10]  Udo Hahn,et al.  Text condensation as knowledge base abstraction , 1988, [1988] Proceedings. The Fourth Conference on Artificial Intelligence Applications.

[11]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[12]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[13]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[14]  Ellen M. Voorhees Variations in relevance judgments and the measurement of retrieval effectiveness , 2000, Inf. Process. Manag..

[15]  Gustave J. Rath,et al.  The formation of abstracts by the selection of sentences , 1961 .

[16]  Horacio Saggion,et al.  Concept Identification and Presentation in the Context of Technical Text Summarization , 2000 .

[17]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[18]  Robert L. Donaway,et al.  A Comparison of Rankings Produced by Summarization Evaluation Measures , 2000 .

[19]  Seiji Miike,et al.  A full-text retrieval system with a dynamic abstract generation function , 1994, SIGIR '94.

[20]  R. Kirk Experimental Design: Procedures for the Behavioral Sciences , 1970 .

[21]  Inderjeet Mani,et al.  Improving Summaries by Revising Them , 1999, ACL.

[22]  Mark T. Maybury,et al.  Generating Summaries from Event Data , 1995, Inf. Process. Manag..

[23]  Inderjeet Mani,et al.  The Challenges of Automatic Summarization , 2000, Computer.

[24]  Ellen M. Voorhees,et al.  The fifth text REtrieval conference (TREC-5) , 1997 .

[25]  Lisa F. Rau,et al.  Automatic Condensation of Electronic Publications by Sentence Selection , 1995, Inf. Process. Manag..

[26]  Jacques Robin,et al.  Revision-based generation of natural language summaries providing historical background: corpus-based analysis, design, implementation and evaluation , 1995 .

[27]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[28]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[29]  Inderjeet Mani,et al.  Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.

[30]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[31]  Karen Spärck Jones Automatic summarising: factors and directions , 1998, ArXiv.

[32]  Kathleen McKeown,et al.  The decomposition of human-written summary sentences , 1999, SIGIR '99.

[33]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[34]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[35]  Gwyneth Doherty-Sneddon,et al.  The Reliability of a Dialogue Structure Coding Scheme , 1997, CL.

[36]  Daniel Marcu,et al.  An Evaluation Road Map for Summarization Research , 2000 .

[37]  Inderjeet Mani,et al.  Multi-Document Summarization by Graph Search and Matching , 1997, AAAI/IAAI.

[38]  神門 典子,et al.  NTCIR workshop 2 meeting : proceedings of the Second NTCIR workshop meeting on evaluation of Chinese & Japanese text retrieval and text summarization , 2001 .

[39]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[40]  Kathleen R. McKeown,et al.  Generating natural language summaries from multiple on-line sources , 1998 .

[41]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[42]  George M. Kasper,et al.  The Effects and Limitations of Automated Text Condensing on Reading Comprehension Performance , 1992, Inf. Syst. Res..

[43]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[44]  Chin-Yew Lin Training a selection function for extraction , 1999, CIKM '99.

[45]  Antonio Zamora,et al.  Automatic Abstracting Research at Chemical Abstracts Service , 1975, J. Chem. Inf. Comput. Sci..

[46]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[47]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[48]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[49]  Harold Borko,et al.  Abstracting Concepts and Methods , 1975 .