An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation

In this paper, we evaluate a variety of knowledge sources and supervised learning algorithms for word sense disambiguation on SENSEVAL-2 and SENSEVAL-1 data. Our knowledge sources include the part-of-speech of neighboring words, single words in the surrounding context, local collocations, and syntactic relations. The learning algorithms evaluated include Support Vector Machines (SVM), Naive Bayes, AdaBoost, and decision tree algorithms. We present empirical results showing the relative contribution of the component knowledge sources and the different learning algorithms. In particular, using all of these knowledge sources and SVM (i.e., a single learning algorithm) achieves accuracy higher than the best official scores on both SENSEVAL-2 and SENSEVAL-1 test data.

[1]  Ted Pedersen,et al.  A New Supervised Learning Algorithm for Word Sense Disambiguation , 1997, AAAI/IAAI.

[2]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[3]  Philip Resnik,et al.  Supervised Sense Tagging using Support Vector Machines , 2001, *SEMEVAL.

[4]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[5]  George A. Miller,et al.  A Topical/Local Classifier for Word Sense Identification , 2000, Comput. Humanit..

[6]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[7]  Raymond J. Mooney,et al.  Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning , 1996, EMNLP.

[8]  Yorick Wilks,et al.  The Interaction of Knowledge Sources in Word Sense Disambiguation , 2001, CL.

[9]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Scott Cotton,et al.  SENSEVAL-2: Overview , 2001, *SEMEVAL.

[12]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[13]  Hae-Chang Rim,et al.  KUNLP system using Classification Information Model at SENSEVAL-2 , 2001, *SEMEVAL.

[14]  Ted Pedersen,et al.  A Decision Tree of Bigrams is an Accurate Predictor of Word Sense , 2001, NAACL.

[15]  David Yarowsky,et al.  Hierarchical Decision Lists for Word Sense Disambiguation , 2000, Comput. Humanit..

[16]  Hwee Tou Ng,et al.  Exemplar-Based Word Sense Disambiguation” Some Recent Improvements , 1997, EMNLP.

[17]  Rada Mihalcea,et al.  Pattern Learning and Active Feature Selection for Word Sense Disambiguation , 2001, *SEMEVAL.

[18]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[19]  Adam Kilgarriff,et al.  Introduction to the Special Issue on SENSEVAL , 2000, Comput. Humanit..

[20]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[21]  Walter Daelemans,et al.  Memory-Based Word Sense Disambiguation , 2000, Comput. Humanit..

[22]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[23]  Walter Daelemans,et al.  Diverse classifiers for NLP disambiguation tasks: comparisons, optimization, combination, and evolution , 2000 .

[24]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[25]  Jordi Girona Salgado An Empirical Study of the Domain Dependence of Supervised Word Sense Disambiguation Systems , 2000 .

[26]  Ted Pedersen Machine Learning with Lexical Features: The Duluth Approach to SENSEVAL-2 , 2001, SENSEVAL@ACL.

[27]  Lluís Màrquez i Villodre,et al.  An Empirical Study of the Domain Dependence of Supervised Word Disambiguation Systems , 2000, EMNLP.