Generative adversarial training for neural machine translation

Abstract Neural machine translation (NMT) is typically optimized to generate sentences which cover n-grams with ground target as much as possible. However, it is widely acknowledged that n-gram precisions, the manually designed approximate loss function, may mislead the model to generate suboptimal translations. To solve this problem, we train the NMT model to generate human-like translations directly by using the generative adversarial net, which has achieved great success in computer vision. In this paper, we build a conditional sequence generative adversarial net (CSGAN-NMT) which comprises of two adversarial sub models, a generative model (generator) which translates the source sentence into the target sentence as the traditional NMT models do and a discriminative model (discriminator) which discriminates the machine-translated target sentence from the human-translated one. The two sub models play a minimax game and achieve a win-win situation when reaching a Nash Equilibrium. As a variant of the single generator-discriminator model, the multi-CSGAN-NMT which contains multiple discriminators and generators, is also proposed. In the multi-CSGAN-NMT model, each generator is viewed as an agent which can interact with others and even transfer messages. Experiments show that the proposed CSGAN-NMT model obtains substantial improvements than the strong baseline and the improvement of the multi-CSGAN-NMT model is more remarkable.

[1]  Yang Liu,et al.  Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[2]  Yoshua Bengio,et al.  Multi-way, multilingual neural machine translation , 2017, Comput. Speech Lang..

[3]  Regina Barzilay,et al.  Aspect-augmented Adversarial Networks for Domain Adaptation , 2017, TACL.

[4]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[5]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[6]  Jorge Civera,et al.  Explicit length modelling for statistical machine translation , 2012, Pattern Recognit..

[7]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[8]  Yoshua Bengio,et al.  Context-dependent word representation for neural machine translation , 2016, Comput. Speech Lang..

[9]  Pascual Martínez-Gómez,et al.  Online adaptation strategies for statistical machine translation in post-editing scenarios , 2012, Pattern Recognit..

[10]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11]  Francisco Casacuberta,et al.  Improving on-line handwritten recognition in interactive machine translation , 2014, Pattern Recognit..

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Richard Socher,et al.  MetaMind Neural Machine Translation System for WMT 2016 , 2016, WMT.

[14]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[15]  Kevin Lin,et al.  Adversarial Ranking for Language Generation , 2017, NIPS.

[16]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[17]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18]  Yoshua Bengio,et al.  Professor Forcing: A New Algorithm for Training Recurrent Networks , 2016, NIPS.

[19]  Marta R. Costa-jussà,et al.  A deep source-context feature for lexical selection in statistical machine translation , 2016, Pattern Recognit. Lett..

[20]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[22]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[23]  Colin Cherry,et al.  A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU , 2014, WMT@ACL.

[24]  Mirjam Sepesy Maucec,et al.  Statistical machine translation of subtitles for highly inflected language pair , 2014, Pattern Recognit. Lett..

[25]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[26]  Niladri Chatterjee,et al.  Some Improvements over the BLEU Metric for Measuring Translation Quality for Hindi , 2007, 2007 International Conference on Computing: Theory and Applications (ICCTA'07).

[27]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[28]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.