Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

We describe Docent, an open-source decoder for statistical machine translation that breaks with the usual sentence-bysentence paradigm and translates complete documents as units. By taking translation to the document level, our decoder can handle feature models with arbitrary discourse-wide dependencies and constitutes an essential infrastructure component in the quest for discourse-aware SMT models. 1 Motivation

[1]  Jörg Tiedemann,et al.  Statistical Machine Translation with Readability Constraints , 2013, NODALIDA.

[2]  Michael Strube,et al.  Multi-Level Annotation in MMAX , 2003, SIGDIAL Workshop.

[3]  Andrei Popescu-Belis,et al.  Machine Translation of Labeled Discourse Connectives , 2012, AMTA.

[4]  Marcello Federico,et al.  Modelling pronominal anaphora in statistical machine translation , 2010, IWSLT.

[5]  Pierre Zweigenbaum,et al.  Enriching Medical Terminologies: an Approach Based on Aligned Corpora , 2006, MIE.

[6]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7]  Marine Carpuat,et al.  One Translation Per Discourse , 2009, SEW@NAACL-HLT.

[8]  Douglas W. Oard,et al.  Encouraging Consistent Translation Choices , 2012, NAACL.

[9]  Philipp Koehn,et al.  Aiding Pronoun Translation with Co-Reference Resolution , 2010, WMT@ACL.

[10]  Mark Hopkins,et al.  Tuning as Ranking , 2011, EMNLP.

[11]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[12]  Jörg Tiedemann,et al.  Feature Weight Optimization for Discourse-Level SMT , 2013, DiscoMT@ACL.

[13]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[14]  Guodong Zhou,et al.  N-gram-based Tense Models for Statistical Machine Translation , 2012, EMNLP.

[15]  Liane Guillou,et al.  Improving Pronoun Translation for Statistical Machine Translation , 2012, EACL.

[16]  Christian Hardmeier,et al.  Discourse in Statistical Machine Translation : A Survey and a Case Study , 2012 .

[17]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[18]  Jörg Tiedemann,et al.  Context Adaptation in Statistical Machine Translation Using Models with Exponentially Decaying Cache , 2010, ACL 2010.

[19]  Jörg Tiedemann,et al.  Document-Wide Decoding for Phrase-Based Statistical Machine Translation , 2012, EMNLP.

[20]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[21]  Philippe Langlais,et al.  A greedy decoder for phrase-based statistical machine translation , 2007, TMI.