Neural Machine Translation With GRU-Gated Attention Model

Neural machine translation (NMT) heavily relies on context vectors generated by an attention network to predict target words. In practice, we observe that the context vectors for different target words are quite similar to one another and translations with such nondiscriminatory context vectors tend to be degenerative. We ascribe this similarity to the invariant source representations that lack dynamics across decoding steps. In this article, we propose a novel gated recurrent unit (GRU)-gated attention model (GAtt) for NMT. By updating the source representations with the previous decoder state via a GRU, GAtt enables translation-sensitive source representations that then contribute to discriminative context vectors. We further propose a variant of GAtt by swapping the input order of the source representations and the previous decoder state to the GRU. Experiments on the NIST Chinese–English, WMT14 English–German, and WMT17 English–German translation tasks show that the two GAtt models achieve significant improvements over the vanilla attention-based NMT. Further analyses on the attention weights and context vectors demonstrate the effectiveness of GAtt in enhancing the discriminating capacity of representations and handling the challenging issue of overtranslation.

[1]  Zhiguo Wang,et al.  Supervised Attentions for Neural Machine Translation , 2016, EMNLP.

[2]  Qun Liu,et al.  Interactive Attention for Neural Machine Translation , 2016, COLING.

[3]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[4]  Yang Liu,et al.  Coverage-based Neural Machine Translation , 2016, ArXiv.

[5]  Mark Fishel,et al.  C-3MA: Tartu-Riga-Zurich Translation Systems for WMT17 , 2017, WMT.

[6]  Shi Feng,et al.  Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model , 2016, ArXiv.

[7]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Deyi Xiong,et al.  A Context-Aware Recurrent Encoder for Neural Machine Translation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Deyi Xiong,et al.  Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks , 2018, EMNLP.

[13]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[16]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Alexander J. Smola,et al.  Neural Machine Translation with Recurrent Attention Modeling , 2016, EACL.

[19]  Wei Xu,et al.  Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation , 2016, TACL.

[20]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[21]  Jing Yang,et al.  Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT , 2018, NLPCC.

[22]  Rico Sennrich,et al.  The University of Edinburgh’s Neural MT Systems for WMT17 , 2017, WMT.

[23]  Kenneth Heafield,et al.  N-gram Counts and Language Models from the Common Crawl , 2014, LREC.

[24]  Marcello Federico,et al.  FBK's Participation to the English-to-German News Translation Task of WMT 2017 , 2017, WMT.

[25]  Yu Zhang,et al.  Training RNNs as Fast as CNNs , 2017, EMNLP 2018.

[26]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[27]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[28]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[29]  Marta R. Costa-jussà,et al.  The TALP-UPC Neural Machine Translation System for German/Finnish-English Using the Inverse Direction Model in Rescoring , 2017, WMT.

[30]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[31]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[32]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[33]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[34]  Shan Wu,et al.  Variational Recurrent Neural Machine Translation , 2018, AAAI.

[35]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[36]  Yang Liu,et al.  Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[37]  Qun Liu,et al.  Deep Neural Machine Translation with Linear Associative Unit , 2017, ACL.

[38]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[39]  Min Zhang,et al.  Neural Machine Translation Advised by Statistical Machine Translation , 2016, AAAI.

[40]  Jinsong Su,et al.  Neural Machine Translation with Deep Attention , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[42]  Zhaopeng Tu,et al.  Modeling Past and Future for Neural Machine Translation , 2017, TACL.

[43]  Marcello Federico,et al.  Deep Neural Machine Translation with Weakly-Recurrent Units , 2018, EAMT.

[44]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.