Improving Neural Machine Translation by Achieving Knowledge Transfer with Sentence Alignment Learning

Neural Machine Translation (NMT) optimized by Maximum Likelihood Estimation (MLE) lacks the guarantee of translation adequacy. To alleviate this problem, we propose an NMT approach that heightens the adequacy in machine translation by transferring the semantic knowledge learned from bilingual sentence alignment. Specifically, we first design a discriminator that learns to estimate sentence aligning score over translation candidates, and then the learned semantic knowledge is transfered to the NMT model under an adversarial learning framework. We also propose a gated self-attention based encoder for sentence embedding. Furthermore, an N-pair training loss is introduced in our framework to aid the discriminator in better capturing lexical evidence in translation candidates. Experimental results show that our proposed method outperforms baseline NMT models on Chinese-to-English and English-to-German translation tasks. Further analysis also indicates the detailed semantic knowledge transfered from the discriminator to the NMT model.

[1]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[2]  Jiasen Lu,et al.  Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model , 2017, NIPS.

[3]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[4]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[5]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[6]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[7]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[8]  Giuseppe Attardi,et al.  Language Modeling , 2013 .

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[11]  Yang Liu,et al.  Neural Machine Translation with Reconstruction , 2016, AAAI.

[12]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[13]  Shi Feng,et al.  Improving Attention Modeling with Implicit Distortion and Fertility for Machine Translation , 2016, COLING.

[14]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[15]  Shuming Shi,et al.  Neural Machine Translation with Adequacy-Oriented Learning , 2018, AAAI.

[16]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[17]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[19]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[20]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[21]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[22]  Nenghai Yu,et al.  Dual Supervised Learning , 2017, ICML.

[23]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[24]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[25]  Wei Chen,et al.  Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets , 2017, NAACL.

[26]  Matt J. Kusner,et al.  GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution , 2016, ArXiv.

[27]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[28]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[29]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[30]  Wanxiang Che,et al.  LTP: A Chinese Language Technology Platform , 2010, COLING.

[31]  Tie-Yan Liu,et al.  Adversarial Neural Machine Translation , 2017, ACML.

[32]  José M. F. Moura,et al.  Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[34]  Xiaoyi Ma,et al.  Champollion: A Robust Parallel Text Sentence Aligner , 2006, LREC.

[35]  Philipp Koehn,et al.  Neural Machine Translation , 2017, ArXiv.

[36]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[37]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[38]  Yang Liu,et al.  Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[39]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).