Use of a text grammar for generating highlight abstracts of magazine articles

Browsing a database of article abstracts is one way to select and buy relevant magazine articles online. Our research contributes to the design and development of text grammars for abstracting texts in unlimited subject domains. We developed a system that parses texts based on the text grammar of a specific text type and that extracts sentences and statements which are relevant for inclusion in the abstracts. The system employs knowledge of the discourse patterns that are typical of news stories. The results are encouraging and demonstrate the importance of discourse structures in text summarisation.

[1]  D. C. Veal Doverton Techniques of document management: a review of text retrieval and related technologies , 2001, J. Documentation.

[2]  T. V. Dijk Discourse and communication : new approaches to the analysis of mass media discourse and communication , 1985 .

[3]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[4]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[5]  David E. Kieras,et al.  Thematic Processes in the Comprehension of Technical Prose. , 1982 .

[6]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[7]  A. Bell The language of news media , 1991 .

[8]  Wendy G. Lehnert,et al.  Strategies for Natural Language Processing , 1982 .

[9]  Elizabeth D. Liddy,et al.  Development, Implementation and Testing of a Discourse Model for Newspaper Texts , 1993, HLT.

[10]  Robert N. Oddy,et al.  Information Retrieval Research , 1982 .

[11]  T. V. Dijk The Study of Discourse , 1997 .

[12]  Michael Brady,et al.  Computational Models of Discourse , 1983 .

[13]  Udo Hahn,et al.  Topic parsing: Accounting for text macro structures in full-text analysis , 1990, Inf. Process. Manag..

[14]  William C. Mann,et al.  Rhetorical structure theory and text analysis , 1989 .

[15]  Chris D. Paice,et al.  The identification of important concepts in highly structured technical papers , 1993, SIGIR.

[16]  Malcolm Coulthard,et al.  Advances in Written Text Analysis , 1994 .

[17]  Lisa F. Rau,et al.  Automatic Condensation of Electronic Publications by Sentence Selection , 1995, Inf. Process. Manag..

[18]  Marie-Francine Moens,et al.  Information extraction from legal texts: the potential of discourse analysis , 1999, Int. J. Hum. Comput. Stud..

[19]  T. V. Dijk News as Discourse , 1990 .

[20]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[21]  Teun A. Van Dijκ Structures of News in the Press , 1985 .

[22]  Seiji Miike,et al.  A full-text retrieval system with a dynamic abstract generation function , 1994, SIGIR '94.

[23]  Aslib,et al.  The journal of documentation , 1945 .

[24]  Candace L. Sidner,et al.  Focusing in the comprehension of definite anaphora , 1986 .

[25]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[26]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[27]  Eduard H. Hovy,et al.  Automated Discourse Generation Using Discourse Structure Relations , 1993, Artif. Intell..

[28]  D. K. Davis News as Discourse , 1989 .

[29]  G. Miller,et al.  Cognitive science. , 1981, Science.

[30]  Takashi Maeda,et al.  An automatic method for extracting significant phrases in scientific or technical documents , 1980, Inf. Process. Manag..

[31]  Lois L. Earl,et al.  Experiments in automatic extracting and indexing , 1970, Inf. Storage Retr..

[32]  Jerry R. Hobbs Coherence and Coreference , 1979, Cogn. Sci..

[33]  Teun A. van Dijk,et al.  Discourse as structure and process , 1997 .

[34]  Karen Spärck Jones What Might be in a Summary? , 1993, Information Retrieval.

[35]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[36]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[37]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[38]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[39]  Karen Spärck Jones,et al.  Automatic Summarizing , 1995, Inf. Process. Manag..

[40]  Chris D. Paice,et al.  The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases , 1980, SIGIR '80.

[41]  James E. Rush,et al.  Automatic abstracting and indexing. II. Production of indicative abstracts by application of contextual inference and syntactic coherence criteria , 1971 .

[42]  Marie-Francine Moens,et al.  Automatic abstracting of magazine articles: the creation of 'Highlight' abstracts , 1998, SIGIR '98.

[43]  Marie-Francine Moens,et al.  Abstracting of Legal Cases: The Potential of Clustering Based on the Selection of Representative Objects , 1999, J. Am. Soc. Inf. Sci..

[44]  Marie-Francine Moens,et al.  Automatic text structuring and categorization as a first step in summarizing legal cases , 1997, Inf. Process. Manag..

[45]  S. Thompson,et al.  Discourse description : diverse linguistic analyses of a fund-raising text , 1992 .