Evaluating the Word Sense Disambiguation Performance of Statistical Machine Translation

We present the first known empirical test of an increasingly common speculative claim, by evaluating a representative Chinese-toEnglish SMT model directly on word sense disambiguation performance, using standard WSD evaluation methodology and datasets from the Senseval-3 Chinese lexical sample task. Much effort has been put in designing and evaluating dedicated word sense disambiguation (WSD) models, in particular with the Senseval series of workshops. At the same time, the recent improvements in the BLEU scores of statistical machine translation (SMT) suggests that SMT models are good at predicting the right translation of the words in source language sentences. Surprisingly however, the WSD accuracy of SMT models has never been evaluated and compared with that of the dedicated WSD models. We present controlled experiments showing the WSD accuracy of current typical SMT models to be significantly lower than that of all the dedicated WSD models considered. This tends to support the view that despite recent speculative claims to the contrary, current SMT models do have limitations in comparison with dedicated WSD models, and that SMT should benefit from the better predictions made by the WSD models. The authors would like to thank the Hong Kong Research Grants Council (RGC) for supporting this research in part through grants RGC6083/99E, RGC6256/00E, and DAG03/04.EG09.

[1]  Ted Pedersen,et al.  The Senseval-3 Multilingual English-­Hindi lexical sample task , 2004, SENSEVAL@ACL.

[2]  Marine Carpuat,et al.  Augmenting ensemble classification for Word Sense Disambiguation with a kernel PCA model , 2004, ACL 2004.

[3]  Dong-Hong Ji,et al.  Optimizing feature set for Chinese Word Sense Disambiguation , 2004, SENSEVAL@ACL.

[4]  Mona T. Diab Relieving the data Acquisition Bottleneck in Word Sense Disambiguation , 2004, ACL.

[5]  Dekai Wu,et al.  Machine Translation with a Stochastic Grammatical Channel , 1998, COLING-ACL.

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[8]  D. Id,et al.  Evaluating sense disambiguation across diverse parameter spaces , 2002 .

[9]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10]  Hwee Tou Ng,et al.  Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study , 2003, ACL.

[11]  Marine Carpuat,et al.  Word Sense Disambiguation vs. Statistical Machine Translation , 2005, ACL.

[12]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[13]  Philip Resnik,et al.  The University of Maryland Senseval-3 system descriptions , 2004, SENSEVAL@ACL.

[14]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[15]  Marine Carpuat,et al.  A Kernel PCA Method for Superior Word Sense Disambiguation , 2004, ACL.

[16]  Hang Li,et al.  Word Translation Disambiguation Using Bilingual Bootstrapping , 2004, Computational Linguistics.

[17]  E. T. Jaynes,et al.  Where do we Stand on Maximum Entropy , 1979 .

[18]  Ulrich Germann,et al.  Greedy Decoding for Statistical Machine Translation in Almost Linear Time , 2003, NAACL.