Multiway Attention for Neural Machine Translation

Neural machine translation (NMT) with source side attention has achieved remarkable performance. Nevertheless, all existing attention mechanisms employ only one attention function. However, several different attention functions have been proposed. They have different mechanisms and capture different information of the sentences, thus a single attention function does not perform well. In the paper, we propose the multiway attention neural machine translation model (MA-NMT) which employs multiple attention functions in the attention mechanism to calculate the weight of each source word when predicting the next target word. Specially, we design three attention functions to get the contextual information. Then we combine the features from all attention function to obtain the final semantic representation. The results of experiments on the English-German translation task demonstrate that the proposed MA-NMT improves the performance than the baseline NMT models.

[1]  Ming Zhou,et al.  Multiway Attention Networks for Modeling Sentence Pairs , 2018, IJCAI.

[2]  Zhen Yang,et al.  Generative adversarial training for neural machine translation , 2018, Neurocomputing.

[3]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[4]  Quoc V. Le,et al.  Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.

[5]  Alexander M. Rush,et al.  OpenNMT: Neural Machine Translation Toolkit , 2018, AMTA.

[6]  Tiejun Zhao,et al.  Syntax-Directed Attention for Neural Machine Translation , 2017, AAAI.

[7]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[8]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[9]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[10]  Yoshua Bengio,et al.  Fine-grained attention mechanism for neural machine translation , 2018, Neurocomputing.

[11]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  Yoshua Bengio,et al.  A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.