IMPROVING THE PERFORMANCE OF BAYESIAN AND SUPPORT VECTOR CLASSIFIERS IN WORD SENSE DISAMBIGUATION USING POSITIONAL INFORMATION

We explore word position-sensitive models and their realizations in word sense disambiguation tasks when using Naive Bayes and Support Vector Machine classifiers. It is shown that a straightforward incorporation of word positional information fails to improve the performance of either method on average. However, we demonstrate that our special kernel that takes into account word positions statistically significantly improves the classification performance. For Support Vector Machines, we apply this kernel instead of the ordinary Bag-of-Words kernel, and for the Bayes classifier the kernel is used for smoothed density estimation. We discuss the benefits and drawbacks of position-sensitive and kernel-smoothed models as well as analyze and evaluate the effects of these models on a subset of the Senseval-3 data.

[1]  Tapio Salakoski,et al.  New Techniques for Disambiguation in Natural Language and Their Application to Biological Text , 2004, J. Mach. Learn. Res..

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[4]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[5]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[6]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[7]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[8]  Tapio Salakoski,et al.  Kernels Incorporating Word Positional Information in Natural Language Disambiguation Tasks , 2005, FLAIRS.

[9]  Jean-Michel Renders,et al.  Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[10]  D. Id,et al.  Evaluating sense disambiguation across diverse parameter spaces , 2002 .

[11]  Laurent Audibert,et al.  Word sense disambiguation criteria: a systematic study , 2004, COLING.

[12]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[13]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[14]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[15]  Alexander J. Smola,et al.  Classification in a normalized feature space using support vector machines , 2003, IEEE Trans. Neural Networks.