Semi-supervised training of a Kernel PCA-Based Model for Word Sense Disambiguation

In this paper, we introduce a new semi-supervised learning model for word sense disambiguation based on Kernel Principal Component Analysis (KPCA), with experiments showing that it can further improve accuracy over supervised KPCA models that have achieved WSD accuracy superior to the best published individual models. Although empirical results with supervised KPCA models demonstrate significantly better accuracy compared to the state-of-the-art achieved by either naive Bayes or maximum entropy models on Senseval-2 data, we identify specific sparse data conditions under which supervised KPCA models deteriorate to essentially a most-frequent-sense predictor. We discuss the potential of KPCA for leveraging unannotated data for partially-unsupervised training to address these issues, leading to a composite model that combines both the supervised and semi-supervised models.

[1]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[2]  Christiane Fellbaum,et al.  English Tasks: All-Words and Verb Lexical Sample , 2001, *SEMEVAL.

[3]  Adam Kilgarriff,et al.  English Lexical Sample Task Description , 2001, *SEMEVAL.

[4]  Adam Kilgarriff,et al.  Framework and Results for English SENSEVAL , 2000, Comput. Humanit..

[5]  Sun-Yuan Kung,et al.  Principal Component Neural Networks: Theory and Applications , 1996 .

[6]  Christopher J. Taylor,et al.  Kernel Principal Component Analysis and the construction of non-linear Active Shape Models , 2001, BMVC.

[7]  Ted Pedersen Machine Learning with Lexical Features: The Duluth Approach to SENSEVAL-2 , 2001, SENSEVAL@ACL.

[8]  Raymond J. Mooney,et al.  Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning , 1996, EMNLP.

[9]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[10]  Dan Klein,et al.  Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[11]  Martha Palmer,et al.  Combining Contextual Features for Word Sense Disambiguation , 2002, SENSEVAL.

[12]  Yuji Matsumoto,et al.  Fast Methods for Kernel-Based Text Analysis , 2003, ACL.

[13]  Marine Carpuat,et al.  A Kernel PCA Method for Superior Word Sense Disambiguation , 2004, ACL.

[14]  D. Id,et al.  Evaluating sense disambiguation across diverse parameter spaces , 2002 .

[15]  George A. Miller,et al.  A Topical/Local Classifier for Word Sense Identification , 2000, Comput. Humanit..