SENSEVAL: an exercise in evaluating world sense disambiguation programs

There are now many computer programs for automatically determining which sense a word is being used in. One would like to be able to say which were better, which worse, and also which words, or varieties of language, presented particular problems to which programs. In this paper I describe a pilot evaluation exercise (‘SENSEVAL’) taking place under the auspices of ACL SIGLEX (the Lexicons Special Interest Group of the Association for Computational Linguistics) and EURALEX (European Association for Lexicography), ELSNET, and EU Projects SPARKLE and ECRAN, in 1998. 1 Word Sense Disambiguation As dictionaries tell us, most common words have more than one meaning. When a word is used in a book or in conversation, generally speaking, just one of those meanings will apply. This is not a problem for people. We are very rarely slowed down in our comprehension by the need to work out which meaning of a word applies. But it is for computers. The clearest case is in Machine Translation. If English drug translates into French as either drogue or médicament, then an English-French MT system needs to disambiguate drug if it is to make the correct translation. For forty years now, people have been writing computer programs to doWord Sense Disambiguation (WSD). Early programs required human experts to write sets of disambiguation rules for each multi-sense word. This was a problem. It involved a huge amount of labour to write rule-sets or “Word Experts” for a substantial amount of the vocabulary. The WSD problem can be divided into two parts. The first is, how do you express what meaning number 1 and meaning number 2 of a word are, to the computer. The second is, how do you work out which of those meanings matches an occurrence of a word to be disambiguated. (Lesk, 1986) took a novel tack, using the text of dictionary definitions as an off-the-shelf answer to the first problem. He then measured the overlap, in terms of words-incommon, between each of the definition texts and the context of the word to be disambiguated. Much recent work uses sophisticated variants of this idea. Dictionary-based approaches remain tied to a particular dictionary, with concomitant errors, imperfections and copyright constraints. With the advent of huge computer corpora, and computers powerful enough to computer complex functions over them, the 1990s has seen new strategies which find the contexts indicative of each sense in a training corpus, and then find the best match between those contexts and the instance of a word to be disambiguated (Yarowsky, 1995).