Evaluation of Automatic Text Summarization

Summaries are an important tool for familiarizing oneself with a subject area. Text summaries are essential when forming an opinion on if reading a document in whole is necessary for our further knowledge aquiring or not. In other words, summaries save time in our daily work. To write a summary of a text is a non-trivial process where one, on one hand has to extract the most central information from the original text, and on the other has to consider the reader of the text and her previous knowledge and possible special interests. Today there are numerous documents, papers, reports and articles available in digital form, but most of them lack summaries. The information in them is often too abundant for it to be possible to manually search, sift and choose which knowledge one should aquire. This information must instead be automatically filtred and extracted in order to avoid drowning in it. Automatic Text Summarization is a technique where a computer summarizes a text. A text is given to the computer and the computer returns a shorter less redundant extract of the original text. So far automatic textsummarization has not yet reached the quality possible with manual summarization, where a human interprets the text and writes a completely new shorter text with new lexical and syntactic choices. However, automatic text summarization is untiring, consistent and always available. Evaluating summaries and automatic text summarization systems is not a straightforward process. What exactly makes a summary beneficial is an elusive property. Generally speaking there are at least two properties of the summary that must be measured when evaluating summaries and summarization systems the Compression Ratio, i.e. how much shorter the summary is than the original, and the Retension Ratio, i.e. how much of the central information is retained. This can for example be accomplished by comparison with existing summaries for the given text. One must also evaluate the qualitative properties of the summaries, for example how coherent and readable the text is. This is usually done by using a panel of human judges. Furthermore, one can also perform task-based evaluations where one tries to discern to what degree the resulting summaries are beneficent for the completion of a specific task. This licentiate thesis thus concerns itself with the many-faceted art of evaluation. It focuses on different aspects of creating an environment for evaluating information extraction systems, with a centre of interest in automatic text summarization. The main body of this work consists of developing human language technology evaluation tools for Swedish, which has been lacking these types of tools. Starting from manual and time consuming evaluation of the Swedish text summarizer SweSum using a Question-Answering schema the thesis moves on to a semi-automatic evaluation where an extract corpus, collected using human informants, can be used repeatedly to evaluate text summarizers at low cost concerning time and effort. Thus, the licentiate thesis describes the first summarization evaluation resources and tools for Swedish, and aims at bringing if not order, then at least overview into chaos.

[1]  Aaron D. Wyner,et al.  Prediction and Entropy of Printed English , 1993 .

[2]  Knut Hofland A Self-Expanding Corpus Based on Newspapers on the Web , 2000, LREC.

[3]  Ralph Grishman,et al.  Summarization System Integrated with Named Entity Tagging and IE pattern Discovery , 2002, LREC.

[4]  Ola Knutsson,et al.  Improving Precision in Information Retrieval for Swedish using Stemming , 2001, NODALIDA.

[5]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[6]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[7]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[8]  Maria Fuentes,et al.  Using cohesive properties of text for automatic summarization , 2002 .

[9]  Jussi Karlgren,et al.  Assembling a Balanced Corpus from the Internet , 1998, NODALIDA.

[10]  Peter Willett,et al.  The Effectiveness of Stemming for Natural-Language Access to Slovene Textual Data , 1992, J. Am. Soc. Inf. Sci..

[11]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[12]  Charles L. Wayne Multilingual Topic Detection and Tracking: Successful Research Enabled by Corpora and Evaluation , 2000, LREC.

[13]  W. Bruce Croft,et al.  Corpus-based stemming using cooccurrence of word variants , 1998, TOIS.

[14]  Martin Duneld Internet as Corpus-Automatic Construction of a Swedish News Corpus , 2001, NODALIDA.

[15]  Hercules Dalianis,et al.  SweNam-A Swedish Named Entity recognizer Its construction, training and evaluation , 2001 .

[16]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[17]  Gerald Salton,et al.  Automatic text processing , 1988 .

[18]  Magnus Sahlgren,et al.  Vector-based semantic analysis: representing word meanings based on random labels , 2001 .

[19]  Daniel Marcu,et al.  The automatic construction of large-scale corpora for summarization research , 1999, SIGIR '99.

[20]  Inderjeet Mani,et al.  The Tipster Summac Text Summarization Evaluation , 1999, EACL.

[21]  Johan Carlberger,et al.  Implementing an Efficient Part-Of-Speech Tagger , 1999, Softw. Pract. Exp..

[22]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[23]  Elizabeth D. Liddy,et al.  Advances in Automatic Text Summarization , 2001, Information Retrieval.

[24]  Karen Spärck Jones Automatic summarising: factors and directions , 1998, ArXiv.

[25]  David A. Hull Stemming Algorithms: A Case Study for Detailed Evaluation , 1996, J. Am. Soc. Inf. Sci..

[26]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[27]  Wessel Kraaij,et al.  Porter's stemming algorithm for Dutch , 1994 .

[28]  Inderjeet Mani,et al.  Summarization Evaluation: An Overview , 2001, NTCIR.