Lexical taxonomies, such as WordNet, are important resources underlying natural language processing techniques such as machine translation and word-sense disambiguation. However, creating and maintaining such taxonomies manually is a tedious and time-consuming task. This has led to a great deal of interest in automatic methods for retrieving taxonomic relations. Efforts for both manual development of taxonomies and automatic acquisition methods have largely focused on English-language resources. Although WordNets exist for many languages, these are usually much smaller than Princeton WordNet (PWN) [1], the major semantic resource for English. For English, an excellent method has been developed for automatically extending lexical taxonomies. Snow et al. [4] show that it is possible to predict new and precise hypernymhyponym relations from a parsed corpus. Unfortunately, the method cannot be easily applied to other languages, as it relies on existing lexical resources. First, the authors use the large amount of data already contained in the PWN to extract a pattern lexicon and train their hypernym classifier. Second, the authors apply word-sense disambiguation based on an existing sense-tagged corpus. Problems in transferring the method to another language include the small size of non-English WordNets and the lack of sense-tagged corpora in other languages. In this paper we apply the method of [4] to a non-English language (Dutch) for which only a basic WordNet and a parser are available. We find that, without an additional sense-tagged corpus, this approach is highly susceptible to noise due to word sense ambiguity. We propose and evaluate two methods to address this problem.
[1]
Gertjan van Noord,et al.
The Alpino Dependency Treebank
,
2001,
CLIN.
[2]
Lonneke van der Plas,et al.
Automatic Acquisition of Lexico-semantic Knowledge for QA
,
2005,
IJCNLP.
[3]
Piek Vossen,et al.
EuroWordNet: A multilingual database with lexical semantic networks
,
1998,
Springer Netherlands.
[4]
Daniel Jurafsky,et al.
Learning Syntactic Patterns for Automatic Hypernym Discovery
,
2004,
NIPS.
[5]
Christiane Fellbaum,et al.
Book Reviews: WordNet: An Electronic Lexical Database
,
1999,
CL.