Using WordNet for Building WordNets

This paper summar ises a set of methodologies and techniques for the fast construction of multilingual WordNets. The English WordNet is used in this approach as a backbone for Catalan and Spanish WordNets and as a lexical knowledge resource for several subtasks. 1 Motivation and Introduction One of the main issues in last years as regards NLP activit ies is the i nc r ea s ing ly fast development of generic language resources. A lot of such resources, including both software and l ingware items (lexicons, lexical databases, grammars, corpora marked in several ways) have been made available for research a n d industrial applications. Special interest presents, for knowledge-based NLP tasks, the availability of wide coverage ontologies. Most known ontologies (as GUM, CYC, ONTOS, MICROKOSMOS, EDR or WORDNET, see [Gomez 98] for an extensive survey) defe r in great extent on several characteristics (e.g. broad coverage vs. domain specific, lexicaUy oriented vs. conceptually oriented, granularity, kind of information placed in nodes, kind of relations, way of building, etc.). It is clear, however, that for a wide range of applications, WordNet (WN) [Miller 90] as become a de-facto standard. The success of WordNet has determined the emergence of several projects that aim the construction of WordNets for other languages than English (e.g., [Hamp & Feldweg 97], [Artale et al. 97]) or to develop multilingual WordNets (the mos t important project in this line is EuroWordNet (EWN)I). lhttp://www.let.uva.rd/~ewn/The aim of EWN vroject is to braid a multi.lingual database with WordN'ets for several european languages (in the first phase, Dutch, Italian and Spanish in addltion to English). The construction of a WN for a language Lg (LgWN) can be tackled in d i f fe ren t ways according to the lexical sources available. Of course the manual construction can be undertaken quite straightforwardly and leads to the best results in terms of accuracy, but has the important drawback of its cost. So, other approaches have been carried out taking profi t of available resources in fully automatic or semi-automatic ways. Which are these lexical resources? Basically four kinds of resources have been used: 1) English WN (EnWN0, as an initial skeleton for trying to attach the words of Lg to it, 2) a l ready existing taxonomies of Lg (both at word and at sense level), 3) bilingual (English and Lg) and 4) monolingual (Lg) dictionaries. All the approaches using EnWN as skeleton are based on the assumption of a close conceptual similarity between English and Lg, in such a way that most of the structure (relations) in EnWN could be maintained for LgWN. In the case of bilingual dictionaries the usual approach is to try to link the English counterpart of entries to synsets in EnWN and to assume that the entry can be ]inked to the same synset. Monolingual dictionaries have been used basically as a source for extracting taxonomic (hypemym) links between words (or senses [Bruce & Guthrie 92], [Rigau et al. 97]) and in lower extent for extracting other kinds of semantic relations [Richardson 97] (e.g. meronymic links). Once a taxonomy of Lg (already existing or built from a monolingual MILD) is available, the task can consist of 1) enriching the taxonomic structure with other semantic links (manually or automatically), as is the case of bu i ld ing individual WNs, or 2) merging this structure with other already existing ontologies (as EnWN or EWN). This paper presents our approach to the construction of WNs for two languages, Spanish and Catalan, and linking the first one to EWN. We have developed a methodology that uses as core source EnWN 2. The methodology implies 1) 2We have used WordNet 1.5. 65

[1]  H. S. Pinto Knowledge Sharing and Reuse , 2022 .

[2]  Jay Liebowitz,et al.  The Handbook of Applied Expert Systems , 1997 .

[3]  P. Vossen,et al.  The EuroWordNet Base Concepts and Top Ontology , 1998 .

[4]  Carlo Strapparava,et al.  Lexical Discrimination with the Italian Version of WordNet , 1997 .

[5]  Helmut Feldweg,et al.  GermaNet - a Lexical-Semantic Net for German , 1997 .

[6]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[7]  Samuel Gili Gaya,et al.  Vox : diccionario general ilustrado de la lengua española , 1954 .

[8]  Stephen D. Richardson,et al.  Determining similarity and inferring relations in a lexical knowledge base , 1997 .

[9]  Louise Guthrie,et al.  Disambiguation: a Study in Weighted Preference* , 2022 .

[10]  Uri Zernik,et al.  Lexical acquisition: Exploiting on-line resources to build a lexicon. , 1991 .

[11]  Kevin Knight,et al.  Building a Large-Scale Knowledge Base for Machine Translation , 1994, AAAI.

[12]  Horacio Rodríguez,et al.  Combining Multiple Methods for the Automatic Construction of Multilingual WordNets , 1997, ArXiv.

[13]  Eneko Agirre,et al.  Combining Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation , 1997, ACL.

[14]  Eduard H. Hovy,et al.  Building Japanese-English Dictionary based on Ontology for Machine Translation , 1994, HLT.

[15]  Sergi Cervell,et al.  Methods and Tools for Building the Catalan WordNet , 1998, ArXiv.