论文信息 - Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences

Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences

This paper describes a system for generating text abstracts which relies on a general, purely statistical principle, i.e., on the notion of "relevance", as it is defined in terms of the combination of feild weights of words in a sentence. The system generates abstracts from newspaper articles by selecting the "most relevant" sentences and combining them in text order. Since neither domain knowledge nor text-sort-specific heuristics are involved, this system provides maximal generality and flexibility. Also, it is fast and can be efficiently implemented for both on-line and off-line purposes. An experiment shows that recall and precision for the extracted sentences (taking the sentences extracted by human subjects as a baseline) is within the same range as recall/precision when the human subjects are compared amongst each other: this means in fact that the performance of the system is indistinguishable from the performance of a human abstractor. Finally, the system yields significantly better results than a default "lead" algorithm does which chooses just some initial sentences from the text.

Klaus Zechner | K. Zechner

[1] H. P. Edmundson,et al. New Methods in Automatic Extracting , 1969, JACM.

[2] Michael Loren Mauldin,et al. Information retrieval by text skimming , 1989 .

[3] Lisa F. Rau,et al. SCISOR: extracting information from on-line news , 1990, CACM.

[4] Chris D. Paice,et al. Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[5] George M. Kasper,et al. The Effects and Limitations of Automated Text Condensing on Reading Comprehension Performance , 1992, Inf. Syst. Res..

[6] Douglas E. Appelt,et al. FASTUS: A System for Extracting Information from Natural-Language Text , 1992 .

[7] James Allan,et al. Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[8] Christian Plaunt,et al. Subtopic structuring for full-length document access , 1993, SIGIR.

[9] Seiji Miike,et al. A full-text retrieval system with a dynamic abstract generation function , 1994, SIGIR '94.

[10] Automatic Summarizing , 1995, Inf. Process. Manag..

[11] Francine Chen,et al. A trainable document summarizer , 1995, SIGIR '95.

[12] Lisa F. Rau,et al. Automatic Condensation of Electronic Publications by Sentence Selection , 1995, Inf. Process. Manag..