Improving neural machine translation with sentence alignment learning

Abstract Neural machine translation (NMT) optimized by maximum likelihood estimation (MLE) usually lacks the guarantee of translation adequacy. To alleviate this problem, we propose an NMT approach that heightens the adequacy in machine translation by transferring the semantic knowledge from bilingual sentence alignment learning. Specifically, we first design a discriminator that learns to estimate sentence aligning score over translation candidates. The discriminator is constructed by gated self-attention based sentence encoders and trained with an N-pair loss for better capturing lexical evidences from bilingual sentence pairs. Then we propose an adversarial training framework as well as a sentence alignment-aware decoding method for NMT to transfer the discriminator’s learned semantic knowledge to NMT models. We conduct our experiments on Chinese → English, Uyghur → Chinese and English → German translation tasks. Experimental results show that our proposed methods outperform baseline NMT models on all these three translation tasks. Further analysis also indicates the characteristics of our approaches and details the semantic knowledge that transfered from the discriminator to the NMT model.

[1]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[2]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[3]  Zhen Yang,et al.  Generative adversarial training for neural machine translation , 2018, Neurocomputing.

[4]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[5]  José M. F. Moura,et al.  Visual Dialog , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[7]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[9]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[10]  Wenguan Wang,et al.  Cascaded Human-Object Interaction Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[12]  Xiaoyi Ma,et al.  Champollion: A Robust Parallel Text Sentence Aligner , 2006, LREC.

[13]  Yang Liu,et al.  Neural Machine Translation with Reconstruction , 2016, AAAI.

[14]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[15]  Yoshua Bengio,et al.  Fine-grained attention mechanism for neural machine translation , 2018, Neurocomputing.

[16]  Song-Chun Zhu,et al.  Reasoning Visual Dialogs With Structural and Partial Observations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18]  Jiasen Lu,et al.  Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model , 2017, NIPS.

[19]  Tie-Yan Liu,et al.  Adversarial Neural Machine Translation , 2017, ACML.

[20]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[21]  Sunita Sarawagi,et al.  Length bias in Encoder Decoder Models and a Case for Global Conditioning , 2016, EMNLP.

[22]  Di He,et al.  Decoding with Value Networks for Neural Machine Translation , 2017, NIPS.

[23]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[24]  Nenghai Yu,et al.  Dual Inference for Machine Learning , 2017, IJCAI.

[25]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[28]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[29]  Ling Shao,et al.  Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Zhaopeng Tu,et al.  Modeling Past and Future for Neural Machine Translation , 2017, TACL.

[31]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[32]  Steven C. H. Hoi,et al.  Salient Object Detection With Pyramid Attention and Salient Edges , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Ruigang Yang,et al.  Inferring Salient Objects from Human Fixations , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[35]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[36]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[37]  Nenghai Yu,et al.  Dual Supervised Learning , 2017, ICML.