Improving Reordering with Linguistically Informed Bilingual n-grams

We present a new reordering model estimated as a standard n-gram language model with units built from morpho-syntactic information of the source and target languages. It can be seen as a model that translates the morpho-syntactic structure of the input sentence, in contrast to standard translation models which take care of the surface word forms. We take advantage from the fact that such units are less sparse than standard translation units to increase the size of bilingual context that is considered during the translation process, thus effectively accounting for mid-range reorderings. Empirical results on French-English and German-English translation tasks show that our model achieves higher translation accuracy levels than those obtained with the widely used lexicalized reordering model.

[1]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[2]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[3]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[4]  Hermann Ney,et al.  POS-based Word Reorderings for Statistical Machine Translation , 2006, LREC.

[5]  Jan Niehues,et al.  A POS-Based Model for Long-Range Reorderings in SMT , 2009, WMT@EACL.

[6]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[8]  Franz Josef Och,et al.  A Systematic Comparison of Phrase-Based, Hierarchical and Syntax-Augmented Statistical MT , 2008, COLING.

[9]  Christoph Tillmann,et al.  A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[10]  Hermann Ney,et al.  Improved chunk-level reordering for statistical machine translation , 2007, IWSLT.

[11]  Philipp Koehn,et al.  Improving Mid-Range Re-Ordering Using Templates of Factors , 2009, EACL.

[12]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[13]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[14]  Helmut Schmid,et al.  Estimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging , 2008, COLING.

[15]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[16]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[17]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[18]  José B. Mariño,et al.  Improving statistical MT by coupling reordering and decoding , 2006, Machine Translation.

[19]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.