Using Lexical Chains for Text Summarization

We investigate one technique to produce a summary of an original text without requiring its full semantic interpretation, but instead relying on a model of the topic progression in the text derived from lexical chains. We present a new algorithm to compute lexical chains in a text, merging several robust knowledge sources: the WordNet thesaurus, a part-of-speech tagger, shallow parser for the identification of nominal groups, and a segmentation algorithm. Summarization proceeds in four steps: the original text is segmented, lexical chains are constructed, strong chains are identified and significant sentences are extracted. We present in this paper empirical results on the identification of strong chains and of significant sentences. Preliminary results indicate that quality indicative summaries are produced. Pending problems are identified. Plans to address these short-comings are briefly presented.

[1]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[2]  Gustave J. Rath,et al.  The formation of abstracts by the selection of sentences , 1961 .

[3]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[4]  Michael Halliday,et al.  Cohesion in English , 1976 .

[5]  Jerry R. Hobbs Coherence and Coreference , 1979, Cogn. Sci..

[6]  C. D. Paice,et al.  Towards the automatic recognition of anaphoric features in English text: the impersonal pronoun “it” , 1987 .

[7]  William C. Mann,et al.  Rhetorical Structure Theory: Description and Construction of Text Structures , 1987 .

[8]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[9]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[10]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[11]  Michael Hoey,et al.  Patterns of Lexis In Text , 1991 .

[12]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[13]  E. Brill A Simple Rule-Based Part of Speech Tagger , 1992, Applied Natural Language Processing Conference.

[14]  David Yarowsky,et al.  Estimating Upper and Lower Bounds on the Performance of Word-Sense Disambiguation Programs , 1992, ACL.

[15]  Karen Spärck Jones What Might be in a Summary? , 1993, Information Retrieval.

[16]  Rebecca J. Passonneau,et al.  Intention-Based Segmentation: Human Reliability and Correlation with Linguistic Cues , 1993, ACL.

[17]  Seiji Miike,et al.  Abstract Generation Based on Rhetorical Structure Extraction , 1994, COLING.

[18]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[19]  Dragomir R. Radev,et al.  Generating summaries of multiple news articles , 1995, SIGIR '95.

[20]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .

[21]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[22]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[23]  Daniel Marcu,et al.  From discourse structures to text summaries , 1997 .

[24]  Simone Teufel,et al.  Sentence extraction as a classification task , 1997 .

[25]  Branimir K. Boguraev,et al.  Salience-based Content Characterisafion of Text Documents , 1997 .

[26]  Regina Barzilay,et al.  Lexical Chains for Summarization , 1997 .

[27]  Teufel Marc MoensCentre Sentence Extraction as a Classiication Task , 1997 .

[28]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[29]  Kathleen R. McKeown,et al.  Summarization Evaluation Methods: Experiments and Analysis , 1998 .

[30]  W. Black Parsing, Linguistic Resources and Semantic Analysis for Abstracting and Categorisation , .