论文信息 - Identifying Similar Words and Contexts in Natural Language with SenseClusters

Identifying Similar Words and Contexts in Natural Language with SenseClusters

SenseClusters is a freely available intelligent system that clusters together similar contexts in natural language text. Thereafter it assigns identifying labels to these clusters based on their content. It is a purely unsupervised approach that is language independent, and uses no knowledge other than what is available in raw un-annotated corpora. In addition to clustering similar contexts, it can be used to identify synonyms and sets of related words. It has been applied to a diverse range of problems, including proper name disambiguation, word sense discrimination, email organization, and document clustering. SenseClusters is a complete system that supports feature selection from large corpora, several different context representation schemes, various clustering algorithms, the creation of descriptive and discriminating labels for the discovered clusters, and evaluation relative to gold standard data.

Ted Pedersen | Anagha Kulkarni

[1] George Karypis,et al. CLUTO - A Clustering Toolkit , 2002 .

[2] Hinrich Sch. Automatic Word Sense Discrimination , 1998 .

[3] Amruta Purandare. Discriminating Among Word Senses Using McQuitty's Similarity Analysis , 2003, HLT-NAACL.

[4] Ted Pedersen,et al. Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces , 2004, CoNLL.

[5] Hinrich Schütze,et al. Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[6] G. Miller,et al. Contextual correlates of semantic similarity , 1991 .

[7] Ted Pedersen,et al. Distinguishing Word Senses in Untagged Text , 1997, EMNLP.

[8] Anagha Kulkarni. Unsupervised Discrimination and Labeling of Ambiguous Names , 2005, ACL.

[9] Ted Pedersen,et al. Name Discrimination by Clustering Similar Contexts , 2005, CICLing.