Preordering for Chinese-Vietnamese Statistical Machine Translation

Word order is one of the most significant differences between the Chinese and Vietnamese. In the phrase-based statistical machine translation, the reordering model will learn reordering rules from bilingual corpora. If the bilingual corpora are large and good enough, the reordering rules are exact and coverable. However, Chinese-Vietnamese is a lowresource language pair, the extraction of reordering rules is limited. This leads to the quality of reordering in Chinese-Vietnamese machine translation is not high. In this paper, we have combined Chinese dependency relation and Chinese-Vietnamese word alignment results in order to pre-order Chinese word order to be suitable to Vietnamese one. The experimental results show that our methodology has improved the machine translation performance compared to the translation system using only the reordering models of phrase-based statistical machine translation. key words: Chinese-Vietnamese machine translation, preordering, word alignment, Chinese grammatical relations, dependency relations

[1]  Khalil Sima'an,et al.  Source reordering using MaxEnt classifiers and supertags , 2010, EAMT.

[2]  Slav Petrov,et al.  Source-Side Classifier Preordering for Machine Translation , 2013, EMNLP.

[3]  Philipp Koehn,et al.  Explorer Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation , 2005 .

[4]  James R. Glass,et al.  Syntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation , 2009, EACL.

[5]  Jungi Kim,et al.  Chinese Syntactic Reordering for Adequate Generation of Korean Verbal Phrases in Chinese-to-Korean SMT , 2009, WMT@EACL.

[6]  Daniel Jurafsky,et al.  Discriminative Reordering with Chinese Grammatical Relations Features , 2009, SSST@HLT-NAACL.

[7]  Dinh Dien,et al.  Linguistic-Relationships-Based Approach for Improving Word Alignment , 2017, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[8]  Tetsuji Nakagawa Efficient Top-Down BTG Parsing for Machine Translation Preordering , 2015, ACL.

[9]  Khalil Sima'an,et al.  A Discriminative Syntactic Model for Source Permutation via Tree Transduction , 2010, SSST@COLING.

[10]  Nizar Habash Syntactic preprocessing for statistical machine translation , 2007, MTSUMMIT.

[11]  Chao Wang,et al.  Chinese Syntactic Reordering for Statistical Machine Translation , 2007, EMNLP.

[12]  Sara Stymne,et al.  Clustered Word Classes for Preordering in Statistical Machine Translation , 2012, EACL 2012.

[13]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[14]  Arianna Bisazza,et al.  Chunk-Based Verb Reordering in VSO Sentences for Arabic-English Statistical Machine Translation , 2010, WMT@ACL.

[15]  Masao Utiyama,et al.  Dependency-based Pre-ordering for Chinese-English Machine Translation , 2014, ACL.

[16]  Dmitriy Genzel,et al.  Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation , 2010, COLING.

[17]  Andy Way,et al.  Source-side Syntactic Reordering Patterns with Functional Words for Improved Phrase-based SMT , 2010, SSST@COLING.

[18]  Kevin Duh,et al.  Head Finalization: A Simple Reordering Rule for SOV Languages , 2010, WMT@ACL.

[19]  Eiichiro Sumita,et al.  Rule-based Reordering Constraints for Phrase-based SMT , 2011, EAMT.

[20]  Kevin Duh,et al.  Post-ordering in Statistical Machine Translation , 2011, MTSUMMIT.

[21]  Gonzalo Iglesias,et al.  Fast and Accurate Preordering for SMT using Neural Networks , 2015, HLT-NAACL.

[22]  Masao Utiyama,et al.  Post-ordering by Parsing for Japanese-English Statistical Machine Translation , 2012, ACL.

[23]  Marta R. Costa-jussà,et al.  Statistical Machine Reordering , 2006, EMNLP.

[24]  A. Waibel,et al.  Rule-based preordering on multiple syntactic levels in statistical machine translation , 2014, IWSLT.

[25]  Thai Phuong Nguyen,et al.  Improving Phrase-Based Statistical Machine Translation with Morpho-Syntactic Analysis and Transformation , 2006, AMTA.