Forced Derivations for Hierarchical Machine Translation

We present an efficient framework to estimate the rule probabilities for a hierarchical phrasebased statistical machine translation system from parallel data. In previous work, this was done with bilingual parsing. We use a more efficient approach splitting the bilingual parsing into two stages, which allows us to train a hierarchical translation model on larger tasks. Furthermore, we apply leave-one-out to counteract over-fitting and use the expected count from the inside-outside algorithm to prune the rule set. On the WMT12 Europarl German→English and French→English tasks, we improve translation quality by up to 1.0 BLEU and 0.9 TER while simultaneously reducing the rule set to 5% of the original size.

[1]  Peng Xu,et al.  A Systematic Comparison of Phrase Table Pruning Techniques , 2012, EMNLP.

[2]  Adam Lopez Tera-Scale Translation Models via Pattern Matching , 2008, COLING.

[3]  Hermann Ney,et al.  A combination of hierarchical systems with forced alignments from phrase-based systems , 2010, IWSLT.

[4]  David Chiang,et al.  Forest Rescoring: Faster Decoding with Integrated Language Models , 2007, ACL.

[5]  Phil Blunsom,et al.  A Discriminative Latent Variable Model for Statistical Machine Translation , 2008, ACL.

[6]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[7]  Jean-Cédric Chappelier,et al.  A Generalized CYK Algorithm for Parsing Stochastic CFG , 1998, TAPD.

[8]  Chris Dyer Two monolingual parses are better than one (synchronous parse) , 2010, HLT-NAACL.

[9]  Philipp Koehn,et al.  Empirical Methods for Compound Splitting , 2003, EACL.

[10]  Hermann Ney,et al.  Jane: Open Source Hierarchical Translation, Extended with Reordering and Lexicon Models , 2010, WMT@ACL.

[11]  Hermann Ney,et al.  Training Phrase Translation Models with Leaving-One-Out , 2010, ACL.

[12]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[13]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[14]  Bowen Zhou,et al.  Two Methods for Extending Hierarchical Rules from the Bilingual Chart Parsing , 2010, COLING.

[15]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[16]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[17]  Ming Zhou,et al.  Forced Derivation Tree based Model Training to Statistical Machine Translation , 2012, EMNLP.

[18]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[19]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[20]  John DeNero,et al.  Why Generative Phrase Models Underperform Surface Heuristics , 2006, WMT@HLT-NAACL.

[21]  Bowen Zhou,et al.  Enriching SCFG rules directly from efficient bilingual chart parsing , 2009, IWSLT.

[22]  Philipp Koehn,et al.  Constraining the Phrase-Based, Joint Probability Statistical Translation Model , 2006, WMT@HLT-NAACL.

[23]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[24]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[25]  Bowen Zhou,et al.  An EM algorithm for SCFG in formal syntax-based translation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.