Tor, TorMd: Distributional Profiles of Concepts for Unsupervised Word Sense Disambiguation

Words in the context of a target word have long been used as features by supervised word-sense classifiers. Mohammad and Hirst (2006a) proposed a way to determine the strength of association between a sense or concept and co-occurring words---the distributional profile of a concept (DPC)---without the use of manually annotated data. We implemented an unsupervised naive Bayes word sense classifier using these DPCs that was best or within one percentage point of the best unsupervised systems in the Multilingual Chinese-English Lexical Sample Task (task #5) and the English Lexical Sample Task (task #17). We also created a simple PMI-based classifier to attempt the English Lexical Substitution Task (task #10); however, its performance was poor.

[1]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[2]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[3]  Philip Resnik WordNet and class-based probabilities , 1998 .

[4]  Peng Jin,et al.  SemEval-2007 Task 05: Multilingual Chinese-English Lexical Sample , 2007, SemEval@ACL.

[5]  Julie Weeds,et al.  Finding Predominant Word Senses in Untagged Text , 2004, ACL.

[6]  Graeme Hirst,et al.  Distributional measures of concept-distance: A task-oriented evaluation , 2006, EMNLP.

[7]  Graeme Hirst,et al.  Cross-Lingual Distributional Profiles of Concepts for Measuring Semantic Distance , 2007, EMNLP.

[8]  Graeme Hirst,et al.  Determining Word Sense Dominance Using a Thesaurus , 2006, EACL.

[9]  Martha Palmer,et al.  SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[10]  Roberto Navigli,et al.  SemEval-2007 Task 10: English Lexical Substitution Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[11]  Christiane Fellbaum,et al.  Wordnet and Class-Based Probabilities , 1998 .

[12]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[13]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[14]  Diana McCarthy,et al.  SemEval-2007 Task 10: English Lexical Substitution Task , 2007, *SEMEVAL.