Term extraction for automatic abstracting

In this paper we describe term extraction from full length journal articles in the domain of crop husbandry for the purpose of producing abstracts automatically. Initially, candidate terms are extracted which occur in one of a number of fixed lexical environments, as found by a system of contextual templates which assigns a semantic role indicator to each candidate term. Candidate terms which can be lexically validated — that is, whose constituent words and structure conform to a simple grammar for their assigned role — receive an enhanced weight. The grammar for lexical validation was derived from a training corpus of 50 journal articles. Selected terms may be used to generate a short abstract which indicates the subject matter of the paper. We also describe a method for compiling a list of sequences which indicate the statistical findings of an experiment, in particular the interrelationships between terms. Such word sequences, when extracted and appended to an indicative abstract, will produce an informative abstract which describes specific research findings in addition to the subject matter of the paper.

[1]  Chris D. Paice,et al.  The identification of important concepts in highly structured technical papers , 1993, SIGIR.

[2]  C. D. Paice,et al.  A ‘Select and Generate’ Approach to Automatic Abstracting , 1993 .

[3]  Glyn Jones,et al.  Concordances in the Classroom , 1990 .

[4]  Elizabeth Du,et al.  The discourse-level structure of empirical abstracts: an exploratory study , 1991, Inf. Process. Manag..

[5]  Padmini Srinivasan,et al.  An investigation of content representation using text grammars , 1993, TOIS.

[6]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[7]  Takashi Maeda,et al.  An automatic method for extracting significant phrases in scientific or technical documents , 1980, Inf. Process. Manag..

[8]  Beatrice Daille,et al.  Combined approach for terminology extraction: lexical statistics and linguistic filtering , 1995 .

[9]  Alan F. Smeaton,et al.  Automatic Phrase Recognition and Extraction from Text , 1997, BCS-IRSG Annual Colloquium on IR Research.

[10]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[11]  David Fisher,et al.  CRYSTAL: Inducing a Conceptual Dictionary , 1995, IJCAI.

[12]  Gerda Ruge,et al.  Effectiveness and efficiency in natural language processing for large amounts of text , 1991, J. Am. Soc. Inf. Sci..

[13]  Ellen Riloff,et al.  Information extraction as a basis for high-precision text classification , 1994, TOIS.

[14]  Michael P. Oakes,et al.  The Automatic Generation of Templates for Automatic Abstracting , 1999, BCS-IRSG Annual Colloquium on IR Research.

[15]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[16]  Paul H. Klingbiel Phrase structure rewrite systems in information retrieval , 1985, Inf. Process. Manag..

[17]  Gerda Ruge,et al.  Effectiveness and Efficiency in Natural Language Processing for Large Amounts of Text. , 1991 .