Learning topic knowledge to improve Chinese word sense disambiguation

This paper addresses an issue of incorporating topic knowledge to improve Chinese word sense disambiguation. The key is how to learn topic knowledge as features in the design of classifiers for disambiguating word senses. This paper presents two solutions to learn topic knowledge. In the first solution, a Chinese domain knowledge dictionary named NEUKD is used to generate domain feature set. However, due to the limited coverage of the NEUKD, a constrained clustering algorithm is adopted for dictionary expansion. The second method is to build topic feature set by utilizing the Latent Dirichlet Allocation (LDA) algorithm on a large scale unlabeled corpus. Experiments on the SENSEVAL-3 Chinese dataset demonstrated that integrating topic knowledge improve the performance of Chinese word sense disambiguation.

[1]  Qin Lu,et al.  Integrating Collocation Features in Chinese Word Sense Disambiguation , 2005, SIGHAN@IJCNLP 2005.

[2]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[3]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[4]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[5]  Yongcheng Wang,et al.  Chinese Word Sense Disambiguation Using HowNet , 2005, ICNC.

[6]  Dong-Hong Ji,et al.  Optimizing feature set for Chinese Word Sense Disambiguation , 2004, SENSEVAL@ACL.

[7]  Jingbo Zhu,et al.  Some Studies on Chinese Domain Knowledge Dictionary and Its Application to Text Classification , 2005, SIGHAN@IJCNLP 2005.

[8]  David Yarowsky,et al.  Hierarchical Decision Lists for Word Sense Disambiguation , 2000, Comput. Humanit..

[9]  Yee Whye Teh,et al.  Improving Word Sense Disambiguation Using Topic Features , 2007, EMNLP.

[10]  Hwee Tou Ng,et al.  An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation , 2002, EMNLP.

[11]  Martha Palmer,et al.  Simple Features for Chinese Word Sense Disambiguation , 2002, COLING.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  S. Kullback,et al.  Information Theory and Statistics , 1959 .