Generic technologies for single- and multi-document summarization

The technologies for single- and multi-document summarization that are described and evaluated in this article can be used on heterogeneous texts for different summarization tasks. They refer to the extraction of important sentences from the documents, compressing the sentences to their essential or relevant content, and detecting redundant content across sentences. The technologies are tested at the Document Understanding Conference, organized by the National Institute of Standards and Technology, USA in 2002 and 2003. The system obtained good to very good results in this competition. We tested our summarization system also on a variety of English Encyclopedia texts and on Dutch magazine articles. The results show that relying on generic linguistic resources and statistical techniques offer a basis for text summarization.

[1]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[2]  Richard M. Schwartz,et al.  Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation , 2003, HLT-NAACL 2003.

[3]  Yaakov Yaari NLP-assisted exploration of texts , 2000, RIAO.

[4]  Udo Hahn,et al.  Topic parsing: Accounting for text macro structures in full-text analysis , 1990, Inf. Process. Manag..

[5]  H. van Halteren Writing Style Recognition and Sentence Extraction , 2002, ACL 2002.

[6]  Marie-Francine Moens,et al.  K.U.Leuven summarization system - DUC 2003 , 2003 .

[7]  Witold Abramowicz Knowledge-Based Information Retrieval and Filtering from the Web , 2003 .

[8]  Michel Beaudouin-Lafon,et al.  Hypermedia exploration with interactive dynamic maps , 1995, Int. J. Hum. Comput. Stud..

[9]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[10]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[11]  Yllias Chali,et al.  The University of Lethbridge Text Summarizer at DUC 2002 , 2002 .

[12]  Tsutomu Hirao,et al.  NTT's Text Summarization System for DUC-2002 , 2002 .

[13]  T. Givon Topic Continuity in Discourse , 1983 .

[14]  Marie-Francine Moens,et al.  The use of topic segmentation for automatic summarization , 2002, ACL 2002.

[15]  M. Walker,et al.  Centering Theory in Discourse , 1998 .

[16]  Inderjeet Mani,et al.  Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.

[17]  Marie-Francine Moens,et al.  Progressive fuzzy clustering for noun phrase coreference resolution , 2003 .

[18]  Eduard Hovy,et al.  Manual and automatic evaluation of summaries , 2002, ACL 2002.

[19]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[20]  Michael Hammond,et al.  Studies in Syntactic Typology , 1988 .

[21]  Marie-Francine Moens,et al.  Generic topic segmentation of document texts , 2001, SIGIR '01.

[22]  Sanda M. Harabagiu,et al.  Generating Single and Multi-Document Summaries with GIST EXTER , 2002 .

[23]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[24]  Yi Guo,et al.  A new multi-document summarisation system , 2003, HLT-NAACL 2003.

[25]  Christopher C. Yang,et al.  Fractal summarization for mobile devices to access large documents on the web , 2003, WWW '03.

[26]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[27]  James Allan,et al.  Temporal summaries of new topics , 2001, SIGIR '01.

[28]  Dragomir R. Radev,et al.  Introduction to the Special Issue on Summarization , 2002, CL.

[29]  Paul Over,et al.  Intrinsic Evaluation of Generic News Text Summarization Systems , 2003 .

[30]  Jeanette K. Gundel Universals of topic-comment structure , 1988 .

[31]  Gerard Salton,et al.  Automatic text decomposition using text segments and text themes , 1996, HYPERTEXT '96.

[32]  André Meinunger Syntactic aspects of topic and comment , 2000 .

[33]  Marie-Francine Moens,et al.  Summarization of texts found on the world wide web , 2003 .

[34]  Kenneth C. Litkowski,et al.  Text Summarization Using XML-Tagged Documents , 2003 .

[35]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[36]  Marie-Francine Moens,et al.  Automatic Indexing and Abstracting of Document Texts , 2000, Computational Linguistics.

[37]  Inderjeet Mani,et al.  Summariz-ing Similarities and Differences Among Related Doc-uments , 2000, AAAI Conference on Artificial Intelligence.

[38]  Roxana Angheluta,et al.  A study about synonym replacement in news corpus , 2002 .

[39]  Kathleen R. McKeown,et al.  SIMFINDER: A Flexible Clustering Tool for Summarization , 2001 .

[40]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[41]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[42]  Donna Harman,et al.  Information Processing and Management , 2022 .

[43]  W. Bruce Croft,et al.  Text Segmentation by Topic , 1997, ECDL.

[44]  Kathleen R. McKeown,et al.  Linear segmentation and segment relevence , 1998 .

[45]  Kathleen R. McKeown,et al.  Domain-specific informative and indicative summarization for information retrieval , 2001 .

[46]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[47]  T. Givon The pragmatics of word order , 1988 .

[48]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.