Pre-ordering of phrase-based machine translation input in translation workflow

Word reordering is a difficult task for decoders when the languages involved have a significant difference in syntax. Phrase-based statistical machine translation (PBSMT), preferred in commercial settings due to its maturity, is particularly prone to errors in long range reordering. Source sentence pre-ordering, as a pre-processing step before PBSMT, proved to be an efficient solution that can be achieved using limited resources. We propose a dependency-based pre-ordering model with parameters optimized using a reordering score to pre-order the source sentence. The source sentence is then translated using an existing phrase-based system. The proposed solution is very simple to implement. It uses a hierarchical phrase-based statistical machine translation system (HPBSMT) for pre-ordering, combined with a PBSMT system for the actual translation. We show that the system can provide alternate translations of less post-editing effort in a translation workflow with German as the source language.

[1]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[2]  Alexandra Birch,et al.  Reordering Metrics for MT , 2011, ACL.

[3]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[4]  Andreas Eisele,et al.  DGT-TM: A freely available Translation Memory in 22 languages , 2012, LREC.

[5]  Jason Katz-Brown,et al.  Syntactic Reordering in Preprocessing for Japanese → English Translation: MIT System Description for NTCIR-7 Patent Translation Task , 2008, NTCIR.

[6]  Nizar Habash Syntactic preprocessing for statistical machine translation , 2007, MTSUMMIT.

[7]  Arianna Bisazza,et al.  Efficient Solutions for Word Reordering in German-English Phrase-Based Statistical Machine Translation , 2013, WMT@ACL.

[8]  Joakim Nivre,et al.  A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing , 2012, EMNLP.

[9]  Slav Petrov,et al.  Training a Parser for Machine Translation Reordering , 2011, EMNLP.

[10]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[11]  Philipp Koehn,et al.  Predicting Success in Machine Translation , 2008, EMNLP.

[12]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[13]  Ananthakrishnan Ramanathan,et al.  A Comparison of Syntactic Reordering Methods for English-German Machine Translation , 2012, COLING.

[14]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.