Incorporating Statistical Machine Translation Word Knowledge Into Neural Machine Translation

Neural machine translation (NMT) has gained more and more attention in recent years, mainly due to its simplicity yet state-of-the-art performance. However, previous research has shown that NMT suffers from several limitations: source coverage guidance, translation of rare words, and the limited vocabulary, while statistical machine translation (SMT) has complementary properties that correspond well to these limitations. It is straightforward to improve the translation performance by combining the advantages of two kinds of models. This paper proposes a general framework for incorporating the SMT word knowledge into NMT to alleviate above word-level limitations. In our framework, the NMT decoder makes more accurate word prediction by referring to the SMT word recommendations in both training and testing phases. Specifically, the SMT model offers informative word recommendations based on the NMT decoding information. Then, we use the SMT word predictions as prior knowledge to adjust the NMT word generation probability, which unitizes a neural network based classifier to digest the discrete word knowledge. In this paper, we use two model variants to implement the framework, one with a gating mechanism and the other with a direct competition mechanism. Experimental results on Chinese-to-English and English-to-German translation tasks show that the proposed framework can take advantage of the SMT word knowledge and consistently achieve significant improvements over NMT and SMT baseline systems.

[1]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Lemao Liu,et al.  Neural Machine Translation with Supervised Attention , 2016, COLING.

[4]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[5]  Min Zhang,et al.  Neural Machine Translation Advised by Statistical Machine Translation , 2016, AAAI.

[6]  Alexander J. Smola,et al.  Neural Machine Translation with Recurrent Attention Modeling , 2016, EACL.

[7]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[8]  Jingbo Zhu,et al.  A Loss-Augmented Approach to Training Syntactic Machine Translation Systems , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Hai Zhao,et al.  Bilingual Continuous-Space Language Model Growing for Statistical Machine Translation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Satoshi Nakamura,et al.  Improving Neural Machine Translation through Phrase-based Forced Decoding , 2017, IJCNLP.

[11]  Min Zhang,et al.  Topic-Based Coherence Modeling for Statistical Machine Translation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[13]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[14]  Yoshua Bengio,et al.  A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[15]  Kenneth Heafield,et al.  N-gram Counts and Language Models from the Common Crawl , 2014, LREC.

[16]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[17]  Jiajun Zhang,et al.  Neural System Combination for Machine Translation , 2017, ACL.

[18]  Hua Wu,et al.  Improved Neural Machine Translation with SMT Features , 2016, AAAI.

[19]  Christopher D. Manning,et al.  Stanford Neural Machine Translation Systems for Spoken Language Domains , 2015, IWSLT.

[20]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[21]  Yang Liu,et al.  Neural Machine Translation with Reconstruction , 2016, AAAI.

[22]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[23]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[24]  Zhaopeng Tu,et al.  Modeling Past and Future for Neural Machine Translation , 2017, TACL.

[25]  Qun Liu,et al.  Encoding Source Language with Convolutional Neural Network for Machine Translation , 2015, ACL.

[26]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[27]  Shi Feng,et al.  Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation , 2016, COLING.

[28]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[29]  Jiajun Zhang,et al.  Towards Zero Unknown Word in Neural Machine Translation , 2016, IJCAI.

[30]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[31]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[32]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[33]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[34]  Qun Liu,et al.  Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation , 2006, ACL.

[35]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[36]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[37]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[38]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[39]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[40]  Shahram Khadivi,et al.  Neural Machine Translation Leveraging Phrase-based Models in a Hybrid Search , 2017, EMNLP.

[41]  Min Zhang,et al.  Translating Phrases in Neural Machine Translation , 2017, EMNLP.

[42]  John DeNero,et al.  Variable-Length Word Encodings for Neural Translation Models , 2015, EMNLP.

[43]  Satoshi Nakamura,et al.  Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[44]  Yang Liu,et al.  Context Gates for Neural Machine Translation , 2016, TACL.

[45]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[46]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[47]  Christopher D. Manning,et al.  Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models , 2016, ACL.

[48]  Yaser Al-Onaizan,et al.  Temporal Attention Model for Neural Machine Translation , 2016, ArXiv.

[49]  Bill Byrne,et al.  Syntactically Guided Neural Machine Translation , 2016, ACL.

[50]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[51]  Kyunghyun Cho,et al.  Natural Language Understanding with Distributed Representation , 2015, ArXiv.

[52]  Adrià de Gispert,et al.  Neural Machine Translation by Minimising the Bayes-risk with Respect to Syntactic Translation Lattices , 2016, EACL.

[53]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[54]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[55]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[56]  José A. R. Fonollosa,et al.  Character-based Neural Machine Translation , 2016, ACL.

[57]  Bo Wang,et al.  SYSTRAN's Pure Neural Machine Translation Systems , 2016, ArXiv.

[58]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[59]  Zhiguo Wang,et al.  Supervised Attentions for Neural Machine Translation , 2016, EMNLP.

[60]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[61]  Yanmin Qian,et al.  Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[62]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[63]  Jan Niehues,et al.  Pre-Translation for Neural Machine Translation , 2016, COLING.