Neural Machine Translation Leveraging Phrase-based Models in a Hybrid Search

In this paper, we introduce a hybrid search for attention-based neural machine translation (NMT). A target phrase learned with statistical MT models extends a hypothesis in the NMT beam search when the attention of the NMT model focuses on the source words translated by this phrase. Phrases added in this way are scored with the NMT model, but also with SMT features including phrase-level translation probabilities and a target language model. Experimental results on German->English news domain and English->Russian e-commerce domain translation tasks show that using phrase-based models in NMT search improves MT quality by up to 2.3% BLEU absolute as compared to a strong NMT baseline.

[1]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[2]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[3]  Hua Wu,et al.  Improved Neural Machine Translation with SMT Features , 2016, AAAI.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[6]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[7]  Adrià de Gispert,et al.  Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices , 2016, EACL.

[8]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[9]  Arianna Bisazza,et al.  Neural versus Phrase-Based Machine Translation Quality: a Case Study , 2016, EMNLP.

[10]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[11]  Evgeny Matusov,et al.  AppTek’s APT machine translation system for IWSLT 2010 , 2010, IWSLT.

[12]  Yoshua Bengio,et al.  On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[13]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[14]  Hermann Ney,et al.  Improvements in dynamic programming beam search for phrase-based statistical machine translation. , 2008, IWSLT.

[15]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[16]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17]  Hermann Ney,et al.  A Comparison between Count and Neural Network Models Based on Joint Translation and Reordering Sequences , 2015, EMNLP.

[18]  Satoshi Nakamura,et al.  Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[19]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[20]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[21]  Rico Sennrich,et al.  The AMU-UEDIN Submission to the WMT16 News Translation Task: Attention-based NMT Models as Feature Functions in Phrase-based SMT , 2016, WMT.

[22]  Hermann Ney,et al.  The RWTH Aachen Machine Translation System for WMT 2010 , 2010, IWSLT.

[23]  Yaohua Tang,et al.  Neural Machine Translation with External Phrase Memory , 2016, ArXiv.

[24]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.