The Trouble with SMT Consistency

SMT typically models translation at the sentence level, ignoring wider document context. Does this hurt the consistency of translated documents? Using a phrase-based SMT system in various data conditions, we show that SMT translates documents remarkably consistently, even without document knowledge. Nevertheless, translation inconsistencies often indicate translation errors. However, unlike in human translation, these errors are rarely due to terminology inconsistency. They are more often symptoms of deeper issues with SMT models instead.

[1]  Katrin Kirchhoff,et al.  Graph-based Learning for Statistical Machine Translation , 2009, NAACL.

[2]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[3]  Jingbo Zhu,et al.  Document-level Consistency Verification in Machine Translation , 2011, MTSUMMIT.

[4]  Douglas W. Oard,et al.  Encouraging Consistent Translation Choices , 2012, NAACL.

[5]  Yifan He,et al.  Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach , 2011, ACL.

[6]  Roland Kuhn,et al.  PORTAGE in the NIST 2009 MT Evaluation , 2009 .

[7]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[8]  Roland Kuhn,et al.  Translating Structured Documents , 2010, AMTA.

[9]  Roland Kuhn,et al.  PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning , 2012, ACL.

[10]  Guodong Zhou,et al.  Cache-based Document-level Statistical Machine Translation , 2011, EMNLP.

[11]  Jörg Tiedemann,et al.  Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache , 2010, ACL 2010.

[12]  Marine Carpuat,et al.  One Translation Per Discourse , 2009, SEW@NAACL-HLT.

[13]  Kenneth Ward Church,et al.  Termight: Identifying and Translating Technical Terminology , 1994, ANLP.

[14]  Jörg Tiedemann To Cache or Not To Cache? Experiments with Adaptive Models in Statistical Machine Translation , 2010, WMT@ACL.

[15]  Takako Aikawa,et al.  Automatic validation of terminology translation consistenscy with statistical method , 2007, MTSUMMIT.