论文信息 - Discriminating Among Word Senses Using McQuitty's Similarity Analysis

Discriminating Among Word Senses Using McQuitty's Similarity Analysis

This paper presents an unsupervised method for discriminating among the senses of a given target word based on the context in which it occurs. Instances of a word that occur in similar contexts are grouped together via McQuitty's Similarity Analysis, an agglomerative clustering algorithm. The context in which a target word occurs is represented by surface lexical features such as unigrams, bigrams, and second order co-occurrences. This paper summarizes our approach, and describes the results of a preliminary evaluation we have carried out using data from the SENSEVAL-2 English lexical sample and the line corpus.

Amruta Purandare

[1] Patrick Pantel,et al. Discovering word senses from text , 2002, KDD.

[2] Hinrich Schütze,et al. Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[3] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[4] Ted Pedersen,et al. A Decision Tree of Bigrams is an Accurate Predictor of Word Sense , 2001, NAACL.

[5] G. Miller,et al. Contextual correlates of semantic similarity , 1991 .

[6] Ted Pedersen,et al. Distinguishing Word Senses in Untagged Text , 1997, EMNLP.

[7] Raymond J. Mooney,et al. Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning , 1996, EMNLP.

[8] Yoshimi Suzuki,et al. Word Sense Disambiguation in Untagged Text based on Term Weight Learning , 1999, EACL.

[9] L. Mcquitty. Similarity Analysis by Reciprocal Pairs for Discrete and Continuous Data , 1966 .

[10] Ellen M. Voorhees,et al. Corpus-Based Statistical Sense Resolution , 1993, HLT.

[11] Yoshihiko Nitta,et al. Co-Occurrence Vectors From Corpora vs. Distance Vectors From Dictionaries , 1994, COLING.