Syntax-based Rewriting for Simultaneous Machine Translation

Divergent word order between languages causes delay in simultaneous machine translation. We present a sentence rewriting method that generates more monotonic translations to improve the speedaccuracy tradeoff. We design grammaticality and meaning-preserving syntactic transformation rules that operate on constituent parse trees. We apply the rules to reference translations to make their word order closer to the source language word order. On Japanese-English translation (two languages with substantially different structure), incorporating the rewritten, more monotonic reference translation into a phrase-based machine translation system enables better translations faster than a baseline system that only uses gold reference translations.

[1]  Christopher D. Manning,et al.  A Simple and Effective Hierarchical Phrase Reordering Model , 2008, EMNLP.

[2]  Philipp Koehn,et al.  Clause Restructuring for Statistical Machine Translation , 2005, ACL.

[3]  Akira Takagi,et al.  Bilingual Spoken Monologue Corpus for Simultaneous Machine Interpretation Research , 2002, LREC.

[4]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[5]  Jordan L. Boyd-Graber,et al.  Don't Until the Final Verb Wait: Reinforcement Learning for Simultaneous Machine Translation , 2014, EMNLP.

[6]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[7]  Noah A. Smith,et al.  Frame-Semantic Parsing , 2014, CL.

[8]  Yasuyoshi Inagaki,et al.  Sync/Trans: Simultaneous Machine Interpretation between English and Japanese , 1999, Australian Joint Conference on Artificial Intelligence.

[9]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10]  Hitomi Tohyama,et al.  Collection of Simultaneous Interpreting Patterns by Using Bilingual Spoken Monologue Corpus , 2006, LREC.

[11]  Tomoki Toda,et al.  Optimizing Segmentation Strategies for Simultaneous Speech Translation , 2014, ACL.

[12]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[13]  Rajai Al-Khanji,et al.  On the use of compensatory strategies in simultaneous interpretation , 2000 .

[14]  Andrej Ljolje,et al.  Segmentation Strategies for Streaming Speech Translation , 2013, HLT-NAACL.

[15]  Srinivas Bangalore,et al.  Real-time Incremental Speech-to-Speech Translation of Dialogs , 2012, NAACL.

[16]  Yusuke Miyao,et al.  Two-Stage Pre-ordering for Japanese-to-English Statistical Machine Translation , 2013, IJCNLP.

[17]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[18]  Tomoki Toda,et al.  Collection of a Simultaneous Translation Corpus for Comparative Analysis , 2014, LREC.

[19]  Hans G. Hönig Using text mappings in teaching consecutive interpreting , 1994, EAMT.

[20]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[21]  Silvia Bernardini,et al.  An Approach to Corpus-Based Interpreting Studies: Developing EPIC (European Parliament Interpreting Corpus) , 2007 .

[22]  Tomoki Toda,et al.  Simple, lexicalized choice of translation timing for simultaneous speech translation , 2013, INTERSPEECH.

[23]  Yusuke Miyao,et al.  Japanese to English Machine Translation using Preordering and Compositional Distributed Semantics , 2014, WAT.

[24]  Erik Camayd-Freixas COGNITIVE THEORY OF SIMULTANEOUS INTERPRETING AND TRAINING , 2011 .

[25]  Peng Xu,et al.  Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages , 2009, NAACL.

[26]  Tomoki Toda,et al.  Constructing a speech translation system using simultaneous interpretation data , 2013, IWSLT.

[27]  Alexander H. Waibel,et al.  Simultaneous translation of lectures and speeches , 2007, Machine Translation.

[28]  Arianna Bisazza,et al.  Fill-up versus interpolation methods for phrase-based SMT adaptation , 2011, IWSLT.

[29]  Alexander H. Waibel,et al.  Spoken language translation from parallel speech audio: Simultaneous interpretation as SLT training data , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Kevin Duh,et al.  Automatic Evaluation of Translation Quality for Distant Language Pairs , 2010, EMNLP.