Automatic summarising: The state of the art

This paper reviews research on automatic summarising in the last decade. This work has grown, stimulated by technology and by evaluation programmes. The paper uses several frameworks to organise the review, for summarising itself, for the factors affecting summarising, for systems, and for evaluation. The review examines the evaluation strategies applied to summarising, the issues they raise, and the major programmes. It considers the input, purpose and output factors investigated in recent summarising research, and discusses the classes of strategy, extractive and non-extractive, that have been explored, illustrating the range of systems built. The conclusions drawn are that automatic summarisation has made valuable progress, with useful applications, better evaluation, and more task understanding. But summarising systems are still poorly motivated in relation to the factors affecting them, and evaluation needs taking much further to engage with the purposes summaries are intended to serve and the contexts in which they are used.

[1]  Claire Grover,et al.  Summarising Legal Texts: Sentential Tense and Argumentative Roles , 2003, HLT-NAACL 2003.

[2]  Weiguo Fan,et al.  WebInEssence: A Personalized Web-Based Multi-Document Summarization and Recommendation System , 2008 .

[3]  Inderjeet Mani,et al.  SUMMAC: a text summarization evaluation , 2002, Natural Language Engineering.

[4]  Qiang Yang,et al.  Web-page summarization using clickthrough data , 2005, SIGIR '05.

[5]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[6]  Ellen M. Voorhees,et al.  Overview of TREC 2005 , 2005, TREC.

[7]  Wei-Ying Ma,et al.  A Study for Document Summarization Based on Personal Annotation , 2003, HLT-NAACL 2003.

[8]  Vibhu O. Mittal,et al.  Ultra-summarization (poster abstract): a statistical approach to generating highly condensed non-extractive summaries , 1999, SIGIR '99.

[9]  Karen Sparck Jones Automatic summarising: a review and discussion of the state of the art , 2007 .

[10]  Vibhu O. Mittal,et al.  OCELOT: a system for summarizing Web pages , 2000, SIGIR '00.

[11]  Kathleen R. McKeown Generating Patient-Specific Summaries of Online Literature , 2002 .

[12]  Daniel Marcu,et al.  An Unsupervised Approach to Recognizing Discourse Relations , 2002, ACL.

[13]  Benjamin K. Tsou,et al.  Mining Discourse Markers for Chinese Textual Summarization , 2000 .

[14]  Michele Banko,et al.  Headline Generation Based on Statistical Translation , 2000, ACL.

[15]  Robert P. Futrelle Handling Figures in Document Summarization , 2004 .

[16]  Jaime G. Carbonell,et al.  The Use of MMR and Diversity-Based Reranking in Document Reranking and Summarization , 1998 .

[17]  Andrew Hickl,et al.  Lite-GISTexter at DUC 2005 , 2005 .

[18]  Kathleen F. McCoy,et al.  Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization , 2002, CL.

[19]  Julia Galliers,et al.  Evaluating natural language processing systems , 1995 .

[20]  Marc Moens,et al.  Sentence extraction and rhetorical classification for flexible abstracts , 1998 .

[21]  Lisa F. Rau,et al.  Automatic Condensation of Electronic Publications by Sentence Selection , 1995, Inf. Process. Manag..

[22]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[23]  Sanda M. Harabagiu,et al.  Topic themes for multi-document summarization , 2005, SIGIR '05.

[24]  Steven K. Feiner,et al.  An Evaluation of Automatically Generated Briefings of Patient Status , 2004, MedInfo.

[25]  Kathleen R. McKeown,et al.  Applying the Pyramid Method in DUC 2005 , 2005 .

[26]  Rachel K. E. Bellamy,et al.  Summarisation miniaturisation: Delivery of news to hand-helds , 2001, HTL 2001.

[27]  Karen Spärck Jones,et al.  Generic summaries for indexing in information retrieval , 2001, SIGIR '01.

[28]  Simone Teufel,et al.  Sentence extraction as a classification task , 1997 .

[29]  Inderjeet Mani,et al.  Using Summarization for Automatic Briefing Generation , 2000 .

[30]  Daniel Marcu,et al.  Generic Sentence Fusion is an Ill-Defined Summarization Task , 2004 .

[31]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[32]  Kathleen R. McKeown,et al.  Towards generating patient specific summaries of medical articles , 2001 .

[33]  Cecilia Hemming Automated Text Summarisation in SUMMARIST A summary made in the GSLT-course Information Access , 2003 .

[34]  Jean-Luc Minel,et al.  How to Appreciate the Quality of Automatic Text Summarization? Examples of FAN and MLUCE Protocols and their Results on SERAPHIN , 1997, ACL 1997.

[35]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[36]  Hongyan Jing Using hidden Markov modeling to decompose human-written summaries : Summarization , 2002 .

[37]  Jure Leskovec,et al.  Impact of Linguistic Analysis on the Semantic Graph Coverage and Learning of Document Extracts , 2005, AAAI.

[38]  Lois L. Earl,et al.  Experiments in automatic extracting and indexing , 1970, Inf. Storage Retr..

[39]  Gustave J. Rath,et al.  The formation of abstracts by the selection of sentences , 1961 .

[40]  Karen Spärck Jones,et al.  Automatic Summarizing , 1995, Inf. Process. Manag..

[41]  Vibhu O. Mittal,et al.  Query-Relevant Summarization using FAQs , 2000, ACL.

[42]  Simone Teufel,et al.  Examining the consensus between human summaries: initial experiments with factoid analysis , 2003, HLT-NAACL 2003.

[43]  Vasileios Hatzivassiloglou,et al.  Event-Based Extractive Summarization , 2004 .

[44]  Chiori Hori,et al.  Evaluation Measures Considering Sentence Concatenation for Automatic Summarization by Sentence or Word Extraction , 2004, Workshop On Text Summarization Branches Out.

[45]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[46]  Claire Cardie,et al.  Selecting sentences for multidocument summaries using randomized local search , 2002, ACL 2002.

[47]  Vibhu O. Mittal,et al.  Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries (poster abstract). , 1998, SIGIR 1999.

[48]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[49]  Horacio Saggion,et al.  Concept Identification and Presentation in the Context of Technical Text Summarization , 2000 .

[50]  Mary Ellen Okurowski,et al.  A Scalable Summarization System Using Robust NLP , 1997 .

[51]  Eduard Hovy,et al.  Manual and automatic evaluation of summaries , 2002, ACL 2002.

[52]  Daniel Marcu,et al.  To build text summaries of high quality, nuclearity is not sufficient , 1998 .

[53]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[54]  Therese Firmin Hand,et al.  A Proposal for Task-based Evaluation of Text Summarization Systems , 1997, Workshop On Intelligent Scalable Text Summarization.

[55]  Antonio Zamora,et al.  Automatic Abstracting Research at Chemical Abstracts Service , 1975, J. Chem. Inf. Comput. Sci..

[56]  Lynette Hirschman,et al.  MiTAP for Biosecurity: A Case Study , 2002, AI Mag..

[57]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[58]  Qunhua Zhao,et al.  Automatic Evaluation of Summaries Using Document Graphs , 2004 .

[59]  Norbert Reithinger,et al.  Summarizing Multilingual Spoken Negotiation Dialogues , 2000, ACL.

[60]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[61]  Daniel Marcu,et al.  Discourse Trees Are Good Indicators of Importance in Text , 1999 .

[62]  Julio Gonzalo,et al.  QARLA: A Framework for the Evaluation of Text Summarization Systems , 2005, ACL.

[63]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[64]  Paul Over,et al.  The Effects of Human Variation in DUC Summarization Evaluation , 2004 .

[65]  Jennifer Rowley,et al.  Abstracting and indexing , 1982 .

[66]  Inderjeet Mani,et al.  Multi-Document Summarization by Graph Search and Matching , 1997, AAAI/IAAI.

[67]  Robert J. Gaizauskas,et al.  POETIC: A system for gathering and disseminating traffic information , 1995, Natural Language Engineering.

[68]  Gareth J. F. Jones,et al.  Applying summarization techniques for term selection in relevance feedback , 2001, SIGIR '01.

[69]  Liang Zhou,et al.  Digesting Virtual "Geek" Culture: The Summarization of Technical Internet Relay Chats , 2005, ACL.

[70]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[71]  Guy Lapalme,et al.  Legal Text Summarization by Exploration of the Thematic Structure and Argumentative Roles , 2004 .

[72]  Julia Hirschberg,et al.  Do Summaries Help? A Task-Based Evaluation of Multi-Document Summarization , 2005 .

[73]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[74]  Tsutomu Hirao An Extrinsic Evaluation for Question-Biased Text Summarization on QA tasks , 2001 .

[75]  Sanda M. Harabagiu,et al.  Generating Single and Multi-Document Summaries with GIST EXTER , 2002 .

[76]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[77]  Kathleen McKeown,et al.  Generating Concise Natural Language Summaries , 1995, Inf. Process. Manag..

[78]  George M. Kasper,et al.  The Effects and Limitations of Automated Text Condensing on Reading Comprehension Performance , 1992, Inf. Syst. Res..

[79]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[80]  Daniel Marcu,et al.  A Noisy-Channel Model for Document Compression , 2002, ACL.

[81]  Karen Spärck Jones Automatic summarising: factors and directions , 1998, ArXiv.

[82]  Hongyan Jing,et al.  Using Hidden Markov Modeling to Decompose Human-Written Summaries , 2002, Computational Linguistics.

[83]  Kathleen R. McKeown,et al.  Columbia multi-document summarization : Approach and evaluation , 2001 .

[84]  Martin van den Berg,et al.  A Rule Based Approach to Discourse Parsing , 2004, SIGDIAL Workshop.

[85]  Julia Hirschberg,et al.  Do summaries help? , 2005, SIGIR '05.

[86]  Mirella Lapata,et al.  Automatic Evaluation of Text Coherence: Models and Representations , 2005, IJCAI.

[87]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[88]  Sadaoki Furui,et al.  SPONTANEOUS SPEECH RECOGNITION AND SUMMARIZATION , 2005 .

[89]  Yoshihiko Gotoh,et al.  On the Subjectivity of Human Authored Short Summaries , 2005 .

[90]  Wai Lam,et al.  Evaluation Challenges in Large-Scale Document Summarization , 2003, ACL.

[91]  Michael Gamon,et al.  Task-Focused Summarization of Email , 2004 .

[92]  Mark T. Maybury,et al.  Generating Summaries from Event Data , 1995, Inf. Process. Manag..

[93]  Christopher Culy,et al.  Hybrid Text Summarization: Combining External Relevance Measures with Structural Analysis , 2004 .

[94]  Marie-Francine Moens,et al.  Abstracting of legal cases: the SALOMON experience , 1997, ICAIL '97.

[95]  Yi Pan,et al.  Sentence Compression for Automated Subtitling: A Hybrid Approach , 2004, ACL 2004.

[96]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[97]  Mark Wasson Using summaries in document retrieval , 2002, ACL 2002.

[98]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[99]  Yoshio Nakao An Algorithm for One-page Summarization of a Long Text Based on Thematic Hierarchy Detection , 2000, ACL.

[100]  Claire Grover,et al.  A Rhetorical Status Classifier for Legal Text Summarisation , 2004 .

[101]  Mary S. Neff,et al.  Multi-document Summarization by Visualizing Topical Content , 2000 .

[102]  Eduard Hovy,et al.  Template-Filtered Headline Summarization , 2004 .

[103]  David Evans,et al.  Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .

[104]  Richard M. Schwartz,et al.  Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation , 2003, HLT-NAACL 2003.

[105]  B. Boguraev Dynamic presentation of document content for rapid on-line skimming , 1998, AAAI 1998.

[106]  Wendy G. Lehnert,et al.  Strategies for Natural Language Processing , 1982 .

[107]  K. Spärck Jones,et al.  Between shallow and deep: an experiment in automatic summarising , 2005 .

[108]  Michele Banko,et al.  Event-Centric Summary Generation , 2004 .

[109]  Kalina Bontcheva,et al.  Using a text engineering framework to build an extendable and portable IE-based summarisation system , 2002, ACL 2002.

[110]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[111]  Ellen M Voorhees Question answering in TREC , 2001, CIKM '01.

[112]  Yoshihiro Ueda,et al.  Evaluation of Phrase-Representation Summarization based on Information Retrieval Task , 2000 .

[113]  Klaus Zechner,et al.  Automatic Summarization of Open-Domain Multiparty Dialogues in Diverse Genres , 2002, CL.

[114]  Tony Taylor,et al.  Text Summarizer in Use: Lessons Learned from Real World Deployment and Evaluation , 2000 .

[115]  Task-Based Evaluation of Summary Quality: Describing Relationships between Scientific Papers , 2001 .

[116]  Kathleen F. McCoy,et al.  Extending Document Summarization to Information Graphics , 2004 .

[117]  Alexander Hauptmann,et al.  Summarization of Broadcast News Video through Link Analysis of Named Entities , 2005 .

[118]  Gregory Grefenstette Producing Intelligent Telegraphic Text Reduction to provide an Audio Scanning Service for the Blind , 1998 .

[119]  Guy Lapalme,et al.  Lakhas, an Arabic summarization system , 2004 .

[120]  Horacio Saggion,et al.  Generating Indicative-Informative Summaries with SumUM , 2002, Computational Linguistics.

[121]  Eduard Hovy,et al.  Automated multi-document summarization in NeATS , 2002 .

[122]  Marie-Francine Moens,et al.  Use of a text grammar for generating highlight abstracts of magazine articles , 2000, J. Documentation.

[123]  Dragomir R. Radev,et al.  Multi-document summarization using off the shelf compression software , 2003, HLT-NAACL 2003.

[124]  Soumen Chakrabarti,et al.  Enhanced topic distillation using text, markup tags, and hyperlinks , 2001, SIGIR '01.

[125]  Udo Hahn,et al.  Topic parsing: Accounting for text macro structures in full-text analysis , 1990, Inf. Process. Manag..

[126]  Seiji Miike,et al.  A full-text retrieval system with a dynamic abstract generation function , 1994, SIGIR '94.

[127]  Satoshi Sato,et al.  Rewriting Saves Extracted Summaries , 1998 .

[128]  Chris Mellish,et al.  Choosing the content of textual summaries of large time-series data sets , 2006, Natural Language Engineering.

[129]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[130]  Tobun Dorbin Ng,et al.  Collages as dynamic summaries for news video , 2002, MULTIMEDIA '02.

[131]  Hitoshi Isahara,et al.  Evaluation of Features for Sentence Extraction on Different Types of Corpora , 2003, ACL 2003.

[132]  Klaus Zechner,et al.  Automatic generation of concise summaries of spoken dialogues in unrestricted domains , 2001, SIGIR '01.

[133]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[134]  Karen Sparck Jones Discourse modelling for automatic summarising , 1993 .

[135]  Dragomir R. Radev,et al.  Experiments in Single and Multi-Document Summarization Using MEAD , 2001 .

[136]  Simon Corston-Oliver,et al.  Text compaction for display on very small screens , 2001 .

[137]  Florence Reeder,et al.  MiTAP for Bio-Security: A Case Study , 2002 .

[138]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[139]  Eduard Hovy,et al.  The Potential and Limitations of Automatic Sentence Extraction for Summarization , 2003, HLT-NAACL 2003.

[140]  Richard M. Schwartz,et al.  A Methodology for Extrinsic Evaluation of Text Summarization: Does ROUGE Correlate? , 2005, IEEvaluation@ACL.