The ARC A3 Project: Terminology Acquisition Tools: Evaluation Method and Task

This paper describes the work achieved in the Concerted Research Project ARC A3 supported and coordinated by the AUF, former Aupelf-Uref. The project deals with the evaluation of term and semantic relation extraction from corpora in French. Eight participants, both from public institutions and industrial corporations were involved in this project and were responsible for producing corpora suitable for extraction tasks and elaborating a protocol in order to evaluate objectively terminology acquisition tools. This expression covers respectively, term extractors, classifiers and semantic relation extractors, The paper also reports on the methodology used for comparing four term extractors, one classifier and three semantic relation extractors during the 2000 evaluation campaign. There are also several by-products of this campaign: first, two corpora which can be used for NLP system development and evaluation as the AUF recommended; and then terminology products: for each corpus a list of terms characterizing the field is available. We are not giving details about the results but rather an assessment of what the evaluation of Terminology Extraction Tools is: how was it done, what were the difficulties, which are the advantages and disadvantages of the adopted protocol, what are the limits and how should we proceed for future testing.

[1]  Stéphane Chaudiron The relevance of quality model for NLP applications , 2000, RIAO.

[2]  Christophe Jouis Contributions à la conceptualisation et à la Modélisation des connaissances à partir d'une analyse linguistique de textes : réalisation d'un prototype : le système SEEK , 1993 .

[3]  Marie-France Bruandet,et al.  Outline of a knowledge-base model for an intelligent information retrieval system , 1989, Inf. Process. Manag..

[4]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[5]  Kenneth Ward Church,et al.  Termight: Identifying and Translating Technical Terminology , 1994, ANLP.

[6]  Ahmed Seffah,et al.  ALADIN, Un atelier orienté objet pour l'analyse et la lecture de Textes assistée par ordinaleur , 1997 .

[7]  Kyo Kageura,et al.  METHODS OF AUTOMATIC TERM RECOGNITION : A REVIEW , 1996 .

[8]  Marie-France Bruandet Outline of a knowledge base model for an intelligent information retrieval system , 1987, SIGIR '87.

[9]  Anne Condamines,et al.  Searching for and identifying conceptual relationships via a corpus-based approach to a Terminological Knowledge Base (CTKB) , 2001 .

[10]  Éric Gaussier Modeles statistiques et patrons morphosyntaxiques pour l'extraction de lexiques bilingues , 1995 .

[11]  Andy Lauriston Automatic recognition of complex terms: Problems and the TERMINO solution , 1994 .

[12]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[13]  Patrick Paroubek,et al.  Les procédures de mesure automatique de l"action GRACE pour l"évaluation des assignateurs de Parties du Discours pour le Français , 1997 .

[14]  Michel Guillou,et al.  Ressources et évaluation en ingénierie des langues , 2000 .

[15]  Marie-Claude L'Homme,et al.  Definition of an evaluation grid for term-extraction software , 1996 .

[16]  Fiammetta Namer,et al.  Une approche linguistique et statistique pour l'analyse de l'information en corpus , 1998 .

[17]  Christophe Jouis,et al.  Terminology extraction and acquisition from textual data: criteria for evaluating tools and methods , 1998 .

[18]  Christian Jacquemin,et al.  EMPIRICAL OBSERVATION OF TERM VARIATIONS AND PRINCIPLES FOR THEIR DESCRIPTION , 1996 .

[19]  Didier Bourigault Analyse syntaxique locale pour le repérage de termes complexes dans un texte , 1993 .