Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces

This paper systematically compares unsupervised word sense discrimination techniques that cluster instances of a target word that occur in raw text using both vector and similarity spaces. The context of each instance is represented as a vector in a high dimensional feature space. Discrimination is achieved by clustering these context vectors directly in vector space and also by finding pairwise similarities among the vectors and then clustering in similarity space. We employ two different representations of the context in which a target word occurs. First order context vectors represent the context of each instance of a target word as a vector of features that occur in that context. Second order context vectors are an indirect representation of the context based on the average of vectors that represent the words that occur in the context. We evaluate the discriminated clusters by carrying out experiments using sense–tagged instances of 24 SENSEVAL2 words and the well known Line, Hard and Serve sense–tagged corpora.