Methods and Tools for Building the Catalan WordNet

Abstract In this paper we introduce the methodology used andthe basic phases we followed to develop the Catalan WordNet,and which lexical resources have been employed in its building.This methodology, as well as the tools we made use of, havebeen thought in a general way so that they could be applied toany other language. 1. Introduction In recent years the research in the NaturalLanguage Processing (NLP) field has proved the need forextensive and complete Lexical Knowledge Bases(LKBs). Acquiring such lexical/semantic structures is ahard problem and has been usually approached byreusing, merging and tuning existing lexical material.While in English the "lexical bottleneck" problem seemsto be softened (e.g. WordNet (Miller, 1990), AlveyLexicon (Grover et al. 1993), COMLEX (Grishman et al.1994) and so on) there are no available wide rangelexicons for NLP for other languages. Manualconstruction of lexicons is the most reliable technique forobtaining structured lexicons but it is costly and highlytime-consuming. This is the reason for many researchersto have focussed on the massive acquisition of lexicalknowledge and semantic information from pre-existingstructured lexical resources as automatically as possible.The work presented here shows a fast productionmethodology to build large scale multilingual LKBs fromconventional dictionaries.In this way, the English WordNet developed atPrinceton University is consolidating as a standard defacto for the lexical-semantic representation of the naturallanguage for English. Considering this English WordNetas a reference, new WordNets in some other languages arebeing built, such as the ones for Galician, Basque, Spanishand Catalan, for instance. Besides, multilingual links areestablished automatically through the connections withthe English WordNet.The research we are now presenting is embeddedin the Catalan WordNet project, which is being carried outby the Grup de Tractament de Llenguatge Natural(GTLN-UPC-UB) of the Centre de Referencia enEnginyeria Linguistica (CREL)