Incremental translation using hierarchichal phrase-based translation system

Hierarchical phrase-based machine translation [1] (Hiero) is a prominent approach for Statistical Machine Translation usually comparable to or better than conventional phrase-based systems. But Hiero typically uses the CKY decoding algorithm which requires the entire input sentence before decoding begins, as it produces the translation in a bottom-up fashion. Left-to-right (LR) decoding [2] is a promising decoding algorithm for Hiero that produces the output translation in left to right order. In this paper we focus on simultaneous translation using the Hiero translation framework. In simultaneous translation, translations are generated incrementally as source language speech input is processed. We propose a novel approach for incremental translation by integrating segmentation and decoding in LR-Hiero. We compare two incremental decoding algorithms for LR-Hiero and present translation quality scores (BLEU) and the latency of generating translations for both decoders on audio lectures from the TED collection.

[1]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[2]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[3]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[4]  Srinivas Bangalore,et al.  Real-time Incremental Speech-to-Speech Translation of Dialogs , 2012, NAACL.

[5]  Haitao Mi,et al.  Efficient Incremental Decoding for Tree-to-String Translation , 2010, EMNLP.

[6]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[7]  R. Kuhn,et al.  Expressive hierarchical rule extraction for left-to-right translation , 2014, AMTA.

[8]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[9]  Anoop Sarkar,et al.  Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine Translation , 2014, EMNLP.

[10]  Anoop Sarkar,et al.  Kriya - An end-to-end Hierarchical Phrase-based MT System , 2012, Prague Bull. Math. Linguistics.

[11]  Taro Watanabe,et al.  Left-to-Right Target Generation for Hierarchical Phrase-Based Translation , 2006, ACL.

[12]  Anoop Sarkar,et al.  Incremental Decoding for Phrase-Based Statistical Machine Translation , 2010, WMT@ACL.

[13]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[14]  Alexander H. Waibel,et al.  Simultaneous translation of lectures and speeches , 2007, Machine Translation.

[15]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[16]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[17]  Tomoki Toda,et al.  Optimizing Segmentation Strategies for Simultaneous Speech Translation , 2014, ACL.

[18]  Anoop Sarkar,et al.  Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering , 2013, EMNLP.

[19]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[20]  Srinivas Bangalore,et al.  Incremental Segmentation and Decoding Strategies for Simultaneous Translation , 2013, IJCNLP.

[21]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.