Syntax-Based Post-Ordering for Efficient Japanese-to-English Translation

This article proposes a novel reordering method for efficient two-step Japanese-to-English statistical machine translation (SMT) that isolates reordering from SMT and solves it after lexical translation. This reordering problem, called post-ordering, is solved as an SMT problem from Head-Final English (HFE) to English. HFE is syntax-based reordered English that is very successfully used for reordering with English-to-Japanese SMT. The proposed method incorporates its advantage into the reverse direction, Japanese-to-English, and solves the post-ordering problem by accurate syntax-based SMT with target language syntax. Two-step SMT with the proposed post-ordering empirically reduces the decoding time of the accurate but slow syntax-based SMT by its good approximation using intermediate HFE. The proposed method improves the decoding speed of syntax-based SMT decoding by about six times with comparable translation accuracy in Japanese-to-English patent translation experiments.

[1]  Peng Xu,et al.  Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages , 2009, NAACL.

[2]  Kevin Duh,et al.  Head Finalization: A Simple Reordering Rule for SOV Languages , 2010, WMT@ACL.

[3]  Fei Xia,et al.  Improving a Statistical MT System with Automatically Learned Rewrite Patterns , 2004, COLING.

[4]  Taro Watanabe,et al.  Inducing a Discriminative Parser to Optimize Machine Translation Reordering , 2012, EMNLP.

[5]  Jason Eisner,et al.  Learning Linear Ordering Problems for Better Translation , 2009, EMNLP.

[6]  Masaaki Nagata,et al.  A Clustered Global Phrase Reordering Model for Statistical Machine Translation , 2006, ACL.

[7]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.

[8]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[9]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[10]  Michel Simard,et al.  Statistical Phrase-Based Post-Editing , 2007, NAACL.

[11]  Christoph Tillmann,et al.  A Unigram Orientation Model for Statistical Machine Translation , 2004, NAACL.

[12]  Kevin Duh,et al.  Post-ordering in Statistical Machine Translation , 2011, MTSUMMIT.

[13]  Slav Petrov,et al.  Training a Parser for Machine Translation Reordering , 2011, EMNLP.

[14]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[15]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[16]  Jun'ichi Tsujii,et al.  Feature Forest Models for Probabilistic HPSG Parsing , 2008, CL.

[17]  N. H. Beebe A Complete Bibliography of ACM Transactions on Asian Language Information Processing , 2007 .

[18]  Hermann Ney,et al.  Accelerated DP based search for statistical translation , 1997, EUROSPEECH.

[19]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[20]  John DeNero,et al.  Inducing Sentence Structure from Parallel Corpora for Reordering , 2011, EMNLP.

[21]  Srinivas Bangalore,et al.  Finite-state models for lexical reordering in spoken language translation , 2000, INTERSPEECH.

[22]  Eiichiro Sumita,et al.  Overview of the Patent Machine Translation Task at the NTCIR-10 Workshop , 2011, NTCIR.

[23]  Ming Zhou,et al.  A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation , 2007, ACL.

[24]  Yuji Matsumoto,et al.  Learning of Linear Ordering Problems and its Application to J-E Patent Translation in NTCIR-9 PatentMT , 2011, NTCIR.

[25]  Masao Utiyama,et al.  Post-ordering by Parsing for Japanese-English Statistical Machine Translation , 2012, ACL.

[26]  Marta R. Costa-jussà,et al.  Statistical Machine Reordering , 2006, EMNLP.

[27]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[28]  Hua Wu,et al.  Pivot language approach for phrase-based statistical machine translation , 2007, ACL.

[29]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[30]  Kevin Duh,et al.  HPSG-Based Preprocessing for English-to-Japanese Translation , 2012, TALIP.

[31]  Takako Aikawa,et al.  Chained System: A Linear Combination of Different Types of Statistical Machine Translation Systems , 2009, MTSUMMIT.

[32]  Hae-Chang Rim,et al.  Bridging Morpho-Syntactic Gap between Source and Target Sentences for English-Korean Statistical Machine Translation , 2009, ACL/IJCNLP.

[33]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[34]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[35]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[36]  Philipp Koehn,et al.  Statistical Post-Editing on SYSTRAN‘s Rule-Based Translation System , 2007, WMT@ACL.

[37]  Hermann Ney,et al.  On the integration of speech recognition and statistical machine translation , 2005, INTERSPEECH.

[38]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[39]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[40]  Kevin Duh,et al.  NTT-UT Statistical Machine Translation in NTCIR-9 PatentMT , 2011, NTCIR.

[41]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[42]  EHARA Terumasa,et al.  Rule based machine translation combined with statistical post editor for Japanese to English patent translation , 2007, MTSUMMIT.

[43]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[44]  Srinivas Bangalore,et al.  Statistical Machine Translation through Global Lexical Selection and Sentence Reconstruction , 2007, ACL.

[45]  Dmitriy Genzel,et al.  Automatically Learning Source-side Reordering Rules for Large Scale Machine Translation , 2010, COLING.

[46]  Kevin Knight,et al.  Training Tree Transducers , 2004, NAACL.

[47]  Jason Katz-Brown,et al.  Syntactic Reordering in Preprocessing for Japanese → English Translation: MIT System Description for NTCIR-7 Patent Translation Task , 2008, NTCIR.