论文信息 - The Effect of Bias on an Automatically-built Word Sense Corpus

The Effect of Bias on an Automatically-built Word Sense Corpus

The goal of this paper is to explore the large-scale automatic acquisition of sense-tagged examples to be used for Word Sense Disambiguation (WSD). We have applied the “monosemous relatives” method on the Web in order to build such a resource for all nouns in WordNet. The analysis of some parameters revealed that the distribution of the word senses (bias) in the training and test corpus is a determinant factor. Provided there is a method to approximate the bias for each word sense, the results we obtained for English are comparable to the use of hand-tagged data (Semcor), which is a very interesting perspective for lesser studied languages.

Eneko Agirre | David Martínez

[1] Shlomo Argamon,et al. Committee-Based Sample Selection for Probabilistic Classifiers , 1999, J. Artif. Intell. Res..

[2] Eneko Agirre,et al. One Sense per Collocation and Genre/Topic Variations , 2000, EMNLP.

[3] George A. Miller,et al. A Semantic Concordance , 1993, HLT.

[4] George A. Miller,et al. Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[5] Scott Cotton,et al. SENSEVAL-2: Overview , 2001, *SEMEVAL.

[6] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[7] Rada Mihalcea,et al. Bootstrapping Large Sense Tagged Corpora , 2002, LREC.

[8] Eneko Agirre,et al. Exploring Automatic Word Sense Disambiguation with Decision Lists and the Web , 2000, SAIC@COLING.

[9] Eneko Agirre,et al. Publicly Available Topic Signatures for all WordNet Nominal Senses , 2004, LREC.