Chunk-Based Verb Reordering in VSO Sentences for Arabic-English Statistical Machine Translation

In Arabic-to-English phrase-based statistical machine translation, a large number of syntactic disfluencies are due to wrong long-range reordering of the verb in VSO sentences, where the verb is anticipated with respect to the English word order. In this paper, we propose a chunk-based reordering technique to automatically detect and displace clause-initial verbs in the Arabic side of a word-aligned parallel corpus. This method is applied to preprocess the training data, and to collect statistics about verb movements. From this analysis, specific verb reordering lattices are then built on the test sentences before decoding them. The application of our reordering methods on the training and test sets results in consistent BLEU score improvements on the NIST-MT 2009 Arabic-English benchmark.

[1]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[2]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3]  Alexandra Birch,et al.  Metrics for MT evaluation: evaluating reordering , 2010, Machine Translation.

[4]  Alexandra Birch,et al.  A Quantitative Analysis of Reordering Phenomena , 2009, WMT@EACL.

[5]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[6]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[7]  Christopher D. Manning,et al.  NP Subject Detection in Verb-initial Arabic Clauses , 2009, MTSUMMIT.

[8]  Daniel Jurafsky,et al.  Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks , 2004, NAACL.

[9]  Smaranda Muresan,et al.  Generalizing Word Lattice Translation , 2008, ACL.

[10]  Nizar Habash,et al.  Syntactic Reordering for English-Arabic Phrase-Based Machine Translation , 2009, SEMITIC@EACL.

[11]  Daniel Marcu,et al.  Fast Decoding and Optimal Decoding for Machine Translation , 2001, ACL.

[12]  Jan Niehues,et al.  A POS-Based Model for Long-Range Reorderings in SMT , 2009, WMT@EACL.

[13]  Arianna Bisazza,et al.  FBK at WMT 2010: Word Lattices for Morphological Reduction and Chunk-Based Reordering , 2010, WMT@ACL.

[14]  Nizar Habash,et al.  Using Shallow Syntax Information to Improve Word Alignment and Reordering for SMT , 2008, WMT@ACL.

[15]  Philip Resnik,et al.  Word-Based Alignment, Phrase-Based Translation: What’s the Link? , 2006, AMTA.

[16]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[17]  Alexander M. Fraser,et al.  A Smorgasbord of Features for Statistical Machine Translation , 2004, NAACL.

[18]  Nizar Habash Syntactic preprocessing for statistical machine translation , 2007, MTSUMMIT.