Neural Machine Translation Advised by Statistical Machine Translation

Neural Machine Translation (NMT) is a new approach to machine translation that has made great progress in recent years. However, recent studies show that NMT generally produces fluent but inadequate translations (Tu et al. 2016b; 2016a; He et al. 2016; Tu et al. 2017). This is in contrast to conventional Statistical Machine Translation (SMT), which usually yields adequate but non-fluent translations. It is natural, therefore, to leverage the advantages of both models for better translations, and in this work we propose to incorporate SMT model into NMT framework. More specifically, at each decoding step, SMT offers additional recommendations of generated words based on the decoding information from NMT (e.g., the generated partial translation and attention history). Then we employ an auxiliary classifier to score the SMT recommendations and a gating function to combine the SMT recommendations with NMT generations, both of which are jointly trained within the NMT architecture in an end-to-end manner. Experimental results on Chinese-English translation show that the proposed approach achieves significant and consistent improvements over state-of-the-art NMT and SMT systems on multiple NIST test sets.

[1]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[2]  Hang Li,et al.  A Deep Memory-based Architecture for Sequence-to-Sequence Learning , 2015 .

[3]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[4]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[5]  Bill Byrne,et al.  Syntactically Guided Neural Machine Translation , 2016, ACL.

[6]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[7]  Wang Ling,et al.  Character-based Neural Machine Translation , 2015, ArXiv.

[8]  K. Paliwal,et al.  Bidirectional Recurrent Neural Networks-Signal Processing, IEEE Transactions on , 1998 .

[9]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[10]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[11]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[12]  Yang Liu,et al.  Neural Machine Translation with Reconstruction , 2016, AAAI.

[13]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[14]  Shi Feng,et al.  Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model , 2016, ArXiv.

[15]  John DeNero,et al.  Variable-Length Word Encodings for Neural Translation Models , 2015, EMNLP.

[16]  Satoshi Nakamura,et al.  Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[17]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[18]  Yang Liu,et al.  Context Gates for Neural Machine Translation , 2016, TACL.

[19]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[20]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[21]  Christopher D. Manning,et al.  Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[22]  Yoshua Bengio,et al.  A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[23]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[24]  Yaohua Tang,et al.  Neural Machine Translation with External Phrase Memory , 2016, ArXiv.

[25]  José A. R. Fonollosa,et al.  Character-based Neural Machine Translation , 2016, ACL.

[26]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[27]  Florence March,et al.  2016 , 2016, Affair of the Heart.

[29]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[30]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[31]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[32]  Hua Wu,et al.  Improved Neural Machine Translation with SMT Features , 2016, AAAI.

[33]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[34]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[35]  John DeNero,et al.  Models and Inference for Prefix-Constrained Machine Translation , 2016, ACL.

[36]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[37]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[38]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[39]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[40]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[41]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.