Word Sense Disambiguation vs. Statistical Machine Translation

We directly investigate a subject of much recent debate: do word sense disambiguation models help statistical machine translation quality? We present empirical results casting doubt on this common, but unproved, assumption. Using a state-of-the-art Chinese word sense disambiguation model to choose translation candidates for a typical IBM statistical MT system, we find that word sense disambiguation does not yield significantly better translation quality than the statistical machine translation system alone. Error analysis suggests several key factors behind this surprising finding, including inherent limitations of current statistical MT architectures.

[1]  Dekai Wu,et al.  A Polynomial-Time Algorithm for Statistical Machine Translation , 1996, ACL.

[2]  Dan Klein,et al.  Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[3]  Marine Carpuat,et al.  A Kernel PCA Method for Superior Word Sense Disambiguation , 2004, ACL.

[4]  Mona T. Diab Relieving the data Acquisition Bottleneck in Word Sense Disambiguation , 2004, ACL.

[5]  Dekai Wu,et al.  Machine Translation with a Stochastic Grammatical Channel , 1998, COLING-ACL.

[6]  Xavier Carreras,et al.  Named Entity Extraction using AdaBoost , 2002, CoNLL.

[7]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[8]  E. T. Jaynes,et al.  Where do we Stand on Maximum Entropy , 1979 .

[9]  Ulrich Germann,et al.  Greedy Decoding for Statistical Machine Translation in Almost Linear Time , 2003, NAACL.

[10]  Marine Carpuat,et al.  Augmenting ensemble classification for Word Sense Disambiguation with a kernel PCA model , 2004, ACL 2004.

[11]  Hang Li,et al.  Word Translation Disambiguation Using Bilingual Bootstrapping , 2004, Computational Linguistics.

[12]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[13]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[14]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[15]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[16]  D. Id,et al.  Evaluating sense disambiguation across diverse parameter spaces , 2002 .

[17]  Robert L. Mercer,et al.  Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[18]  Hwee Tou Ng,et al.  Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study , 2003, ACL.

[19]  Taro Watanabe,et al.  Reordering Constraints for Phrase-Based Statistical Machine Translation , 2004, COLING.

[20]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[21]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.