Good Neighbors Make Good Senses: Exploiting Distributional Similarity for Unsupervised WSD

We present an automatic method for senselabeling of text in an unsupervised manner. The method makes use of distributionally similar words to derive an automatically labeled training set, which is then used to train a standard supervised classifier for distinguishing word senses. Experimental results on the Senseval-2 and Senseval-3 datasets show that our approach yields significant improvements over state-of-the-art unsupervised methods, and is competitive with supervised ones, while eliminating the annotation cost.

[1]  Jean Véronis,et al.  HyperLex: lexical cartography for information retrieval , 2004, Comput. Speech Lang..

[2]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[3]  Amanda Spink,et al.  Linguistic Aspects of Web Queries. , 2000 .

[4]  Ganesh Ramakrishnan,et al.  Passage Scoring for Question Answering via Bayesian Inference on Lexical Relations , 2003, TREC.

[5]  Julie Elizabeth Weeds,et al.  Measures and applications of lexical distributional similarity , 2003 .

[6]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[7]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[8]  Philip G. Edmonds Designing a task for SENSEVAL-2 , 2000 .

[9]  Ted Briscoe,et al.  Robust Accurate Statistical Annotation of General Text , 2002, LREC.

[10]  Dong-Hong Ji,et al.  Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning , 2005, ACL.

[11]  Julie Weeds,et al.  Finding Predominant Word Senses in Untagged Text , 2004, ACL.

[12]  Rada Mihalcea,et al.  Word sense disambiguation with pattern learning and automatic feature selection , 2002, Natural Language Engineering.

[13]  Hwee Tou Ng,et al.  Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study , 2003, ACL.

[14]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[15]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[16]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[17]  Hwee Tou Ng,et al.  An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation , 2002, EMNLP.

[18]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[19]  Daniel Jurafsky,et al.  Learning to Merge Word Senses , 2007, EMNLP.

[20]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[21]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[22]  Hal Daumé Notes on CG and LM-BFGS Optimization of Logistic Regression , 2008 .

[23]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[24]  Daphne Koller,et al.  Word-Sense Disambiguation for Machine Translation , 2005, HLT.

[25]  Kathleen McKeown,et al.  Improving Word Sense Disambiguation in Lexical Chaining , 2003, IJCAI.

[26]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[27]  Hwee Tou Ng,et al.  Getting Serious about Word Sense Disambiguation , 2002 .