A Formal Model for Information Selection in Multi-Sentence Text Extraction

Selecting important information while accounting for repetitions is a hard task for both summarization and question answering. We propose a formal model that represents a collection of documents in a two-dimensional space of textual and conceptual units with an associated mapping between these two dimensions. This representation is then used to describe the task of selecting textual units for a summary or answer as a formal optimization task. We provide approximation algorithms and empirically validate the performance of the proposed model when used with two very different sets of features, words and atomic events.

[1]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[2]  D. Hochbaum Approximating covering and packing problems: set cover, vertex cover, independent set, and related problems , 1996 .

[3]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[4]  Jade Goldstein-Stewart,et al.  Creating and evaluating multi-document sentence extract summaries , 2000, CIKM '00.

[5]  Vasileios Hatzivassiloglou,et al.  Domain -independent detection, extraction, and labeling of Atomic Events , 2003 .

[6]  Vasileios Hatzivassiloglou,et al.  Event-Based Extractive Summarization , 2004 .

[7]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[8]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[9]  Kathleen McKeown,et al.  DefScriber: a hybrid system for definitional QA , 2003, SIGIR '03.

[10]  Ellen M. Voorhees Evaluating Answers to Definition Questions , 2003, HLT-NAACL.

[11]  Kathleen R. McKeown,et al.  SIMFINDER: A Flexible Clustering Tool for Summarization , 2001 .

[12]  Daniel Marcu,et al.  From discourse structures to text summaries , 1997 .

[13]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[14]  Simone Teufel,et al.  Sentence extraction as a classification task , 1997 .

[15]  Eduard H. Hovy,et al.  Identifying Topics by Position , 1997, ANLP.

[16]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.