Multilingual Terminology Extraction and Validation

This paper presents the automatic terminology extraction approach developed within project LIQUID 1 . This project aims at developing a cost-effective solution for the problem of cross-language access to multilingual text databases in technical and scientific domains. Cross-Language Information Retrieval faces a major challenge: organizing unstructured textual information according to its contents and regardless of its language. Our solution is based on two main components, a terminology extraction tool and a domain-specific ontology. The terminology extraction tool identifies the terminology that describes the contents of a particular document. Then, these terms are linked to a domain-specific ontology. This paper presents the terminology extraction tool and the experimental results obtained in the domain of Gastroenterology.

[1]  Carol Peters,et al.  Applying EuroWordNet to Cross-Language Text Retrieval , 1998, Comput. Humanit..

[2]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[3]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[4]  Christian Fluhr,et al.  Textual database lexicon used as a filter to resolve semantic ambiguity application on multilingual , 1995 .

[5]  W. Bruce Croft,et al.  Dictionary Methods for Cross-Lingual Information Retrieval , 1996, DEXA.

[6]  Djoerd Hiemstra,et al.  A domain Specific Lexicon Acquisition Tool for Cross-Language Information Retrieval , 1997, RIAO.

[7]  Juan C. Sager,et al.  A practical course in terminology processing , 1990 .

[8]  Yiming Yang,et al.  Translingual Information Retrieval: Learning from Bilingual Corpora , 1998, Artif. Intell..

[9]  J. Carbonell,et al.  Translingual Information Retrieval: Learning from Bilingual Corpora (ai Journal Special Issue: Best of Ijcai-97) , 1997 .

[10]  Evelyne Tzoukermann,et al.  Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax , 1997, ACL.

[11]  Didier Bourigault An Endogeneous Corpus-Based Method for Structural Noun Phrase Disambiguation , 1993, EACL.

[12]  W. Bruce Croft,et al.  Term clustering of syntactic phrases , 1989, SIGIR '90.

[13]  M. Felisa Verdejo,et al.  Using Eurowordnet in a Concept-Based Approach to Cross-Language Text Retrieval , 1999, Appl. Artif. Intell..

[14]  Christian Jacquemin,et al.  EMPIRICAL OBSERVATION OF TERM VARIATIONS AND PRINCIPLES FOR THEIR DESCRIPTION , 1996 .

[15]  Didier Bourigault,et al.  LEXTER, a Natural Language Processing Tool for Terminology Extraction , 1996 .

[16]  Douglas W. Oard,et al.  Alternative Approaches for Cross-Language Text Retrieval , 1997 .

[17]  ChengXiang Zhai,et al.  Noun-Phrase Analysis in Unrestricted Text for Information Retrieval , 1996, ACL.

[18]  Evelyne Tzoukermann,et al.  NLP for Term Variant Extraction: Synergy Between Morphology, Lexicon, and Syntax , 1999 .

[19]  Jean Paul Ballerini,et al.  Experiments in multilingual information retrieval using the SPIDER system , 1996, SIGIR '96.