Optimizing feature set for Chinese Word Sense Disambiguation

This article describes the implementation of I2R word sense disambiguation system (I2R −WSD) that participated in one senseval3 task: Chinese lexical sample task. Our core algorithm is a supervised Naive Bayes classifier. This classifier utilizes an optimal feature set, which is determined by maximizing the cross validated accuracy of NB classifier on training data. The optimal feature set includes partof-speech with position information in local context, and bag of words in topical context.