Integrating Collocation Features in Chinese Word Sense Disambiguation

The selection of features is critical in providing discriminative information for classifiers in Word Sense Disambiguation (WSD). Uninformative features will degrade the performance of classifiers. Based on the strong evidence that an ambiguous word expresses a unique sense in a given collocation, this paper reports our experiments on automatic WSD using collocation as local features based on the corpus extracted from People’s Daily News (PDN) as well as the standard SENSEVAL-3 data set. Using the Naive Bayes classifier as our core algorithm, we have implemented a classifier using a feature set combining both local collocation features and topical features. The average precision on the PDN corpus has 3.2% improvement compared to 81.5% of the baseline system where collocation features are not considered. For the SENSEVAL-3 data, we have reached the precision rate of 37.6% by integrating collocation features into contextual features, to achieve 37% improvement over 26.7% of precision in the baseline system. Our experiments have shown that collocation features can be used to reduce the size of human tagged corpus.

[1]  Hwee Tou Ng,et al.  Getting Serious about Word Sense Disambiguation , 2002 .

[2]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[3]  Yorick Wilks,et al.  Subject-Dependent Co-Occurence and Word Sense Disambiguation , 1991, ACL.

[4]  Yin Li,et al.  Improving Xtract for Chinese collocation extraction , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[5]  Eneko Agirre,et al.  Word Sense Disambiguation using Conceptual Density , 1996, COLING.

[6]  Paul Buitelaar,et al.  Unsupervised Monolingual and Bilingual Word-Sense Disambiguation of Medical Documents using UMLS , 2003, BioNLP@ACL.

[7]  Martha Palmer,et al.  Simple Features for Chinese Word Sense Disambiguation , 2002, COLING.

[8]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[9]  Ellen M. Voorhees,et al.  Disambiguating Highly Ambiguous Words , 1998, CL.

[10]  David Yarowsky,et al.  Hierarchical Decision Lists for Word Sense Disambiguation , 2000, Comput. Humanit..

[11]  Eneko Agirre,et al.  Combining Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation , 1997, ACL.

[12]  Jason S. Chang,et al.  A Concept-based Adaptive Approach to Word Sense Disambiguation , COLING.

[13]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[14]  Dong-Hong Ji,et al.  Optimizing feature set for Chinese Word Sense Disambiguation , 2004, SENSEVAL@ACL.

[15]  Ezra Black,et al.  An Experiment in Computational Discrimination of English Word Senses , 1988, IBM J. Res. Dev..

[16]  Hwee Tou Ng,et al.  Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study , 2003, ACL.

[17]  Ellen M. Voorhees,et al.  Corpus-Based Statistical Sense Resolution , 1993, HLT.

[18]  Key-Sun Choi,et al.  Word Sense Disambiguation using Static and Dynamic Sense Vectors , 2002, COLING.

[19]  Lluís Màrquez i Villodre,et al.  Boosting Applied to Word Sense Disambiguation , 2000, ArXiv.

[20]  Yin Li,et al.  An automatic Chinese collocation extraction algorithm based on lexical statistics , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[21]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[22]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.