论文信息 - A Supervised Method of Feature Weighting for Measuring Semantic Relatedness

A Supervised Method of Feature Weighting for Measuring Semantic Relatedness

The clustering of related words is crucial for a variety of Natural Language Processing applications. Many known techniques of word clustering use the context of a word to determine its meaning. Words which frequently appear in similar contexts are assumed to have similar meanings. Word clustering usually applies the weighting of contexts, based on some measure of their importance. One of the most popular measures is Pointwise Mutual Information. It increases the weight of contexts where a word appears regularly but other words do not, and decreases the weight of contexts where many words may appear. Essentially, it is unsupervised feature weighting. We present a method of supervised feature weighting. It identifies contexts shared by pairs of words known to be semantically related or unrelated, and then uses Pointwise Mutual Information to weight these contexts on how well they indicate closely related words. We use Roget's Thesaurus as a source of training and evaluation data. This work is as a step towards adding new terms to Roget's Thesaurus automatically, and doing so with high confidence.

Alistair Kennedy | Stan Szpakowicz

[1] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2] Dan Roth,et al. Context Sensitive Paraphrasing with a Global Unsupervised Classifier , 2007, ECML.

[3] Stefan Evert,et al. The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[4] Gerda Ruge,et al. Automatic Detection of Thesaurus relations for Information Retrieval Applications , 1997, Foundations of Computer Science: Potential - Theory - Cognition.

[5] In-Ho Kang,et al. Query type classification for web document retrieval , 2003, SIGIR.

[6] Patrick Pantel,et al. Clustering by committee , 2003 .

[7] Alistair Kennedy,et al. Evaluating Roget's Thesauri , 2008, ACL.

[8] George W. Davidson,et al. Roget's Thesaurus of English Words and Phrases , 1982 .

[9] Stan Szpakowicz,et al. Corpus-based Semantic Relatedness for the Construction of Polish WordNet , 2008, LREC.

[10] Peter McBurney,et al. Thirty-First Australasian Computer Science Conference (ACSC 2008) , 2008 .

[11] Daniel Jurafsky,et al. Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.