Using Multilingual Terms for Biomedical Term Extraction

The goal of automatic term extraction often is not so much the creation of a new list of domain specific terms, but rather the (semi-) automatic extension of a list of known terms. In this paper, we focus on the use of existing terms from glossaries, thesaurus, or ontologies to extract new terms from a domain specific text. Our new method is used to extract language-specific terms with the help of multilingual terminological resources. Our baseline system combines a linguistic pattern for extracting candidate noun phrases with a statistical method (χ) for ranking candidate phrases according to their association strength in a domain-specific corpus. Our scoring method also takes into account the termhood of candidate phrases computed on the basis of a list of known terms. We show that uninterpolated average precision of the resulting term list is improved when tested using human evalu-

[1]  Sophia Ananiadou,et al.  A Methodology for Automatic Term Recognition , 1994, COLING.

[2]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[3]  Sophia Ananiadou,et al.  The C-value/NC-value Method of Automatic Recognition for Multi-Word Terms , 1998, ECDL.

[4]  Antonio S. Valderrabanos,et al.  Multilingual Terminology Extraction and Validation , 2002, LREC.

[5]  Vincent E. Giuliano,et al.  THE INTERPRETATION OF WORD ASSOCIATIONS. , 1963 .

[6]  Sophia Ananiadou,et al.  Identifying Terms by their Family and Friends , 2000, COLING.

[7]  Didier Bourigault,et al.  Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases , 1992, COLING.

[8]  Robert Malouf,et al.  Wide Coverage Parsing with Stochastic Attribute Value Grammars , 2004 .

[9]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[10]  Sougata Mukherjea,et al.  Enhancing a biomedical information extraction system with dictionary mining and context disambiguation , 2004, IBM J. Res. Dev..

[11]  D. Maynard Term recognition using combined knowledge sources , 1999 .

[12]  Amiel Feinstein,et al.  Transmission of Information. , 1962 .

[13]  Kenneth Ward Church,et al.  Termight: Identifying and Translating Technical Terminology , 1994, ANLP.

[14]  Sophia Ananiadou,et al.  The C-value/NC-value domain-independent method for multi-word term extraction , 1999 .

[15]  Éric Gaussier,et al.  Bilingual terminology extraction : an approach based on a multilingual thesaurus applicable to comparable corpora , 2002 .

[16]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[17]  L. J. V. Beek,et al.  Een brede computationele grammatica voor het Nederlands , 2002 .

[18]  Evelyne Tzoukermann,et al.  Expansion of multi-word terms for indexing and retrieval using morphology and syntax , 1997 .

[19]  Paul Deane,et al.  A Nonparametric Method for Extraction of Candidate Phrasal Terms , 2005, ACL.

[20]  Béatrice Daille,et al.  Study and Implementation of Combined Techniques for Automatic Extraction of Terminology , 1994 .

[21]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[22]  Daniel Jurafsky,et al.  Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem? , 2001, EMNLP.

[23]  Joaquim Ferreira da Silva Extracting Multiword Terms from Document Collections , 1999 .

[24]  Mill Johannes G.A. Van,et al.  Transmission Of Information , 1961 .

[25]  Satanjeev Banerjee,et al.  The Design, Implementation, and Use of the Ngram Statistics Package , 2003, CICLing.

[26]  Kyo Kageura,et al.  METHODS OF AUTOMATIC TERM RECOGNITION : A REVIEW , 1996 .