论文信息 - Generative adversarial training for neural machine translation - 字舞流文

Generative adversarial training for neural machine translation

Abstract Neural machine translation (NMT) is typically optimized to generate sentences which cover n-grams with ground target as much as possible. However, it is widely acknowledged that n-gram precisions, the manually designed approximate loss function, may mislead the model to generate suboptimal translations. To solve this problem, we train the NMT model to generate human-like translations directly by using the generative adversarial net, which has achieved great success in computer vision. In this paper, we build a conditional sequence generative adversarial net (CSGAN-NMT) which comprises of two adversarial sub models, a generative model (generator) which translates the source sentence into the target sentence as the traditional NMT models do and a discriminative model (discriminator) which discriminates the machine-translated target sentence from the human-translated one. The two sub models play a minimax game and achieve a win-win situation when reaching a Nash Equilibrium. As a variant of the single generator-discriminator model, the multi-CSGAN-NMT which contains multiple discriminators and generators, is also proposed. In the multi-CSGAN-NMT model, each generator is viewed as an agent which can interact with others and even transfer messages. Experiments show that the proposed CSGAN-NMT model obtains substantial improvements than the strong baseline and the improvement of the multi-CSGAN-NMT model is more remarkable.

Zhen Yang | Wei Chen | Bo Xu | Feng Wang | Wei Chen | Bo Xu | Zhen Yang | Feng Wang

[1] Yang Liu,et al. Minimum Risk Training for Neural Machine Translation , 2015, ACL.

[2] Yoshua Bengio,et al. Multi-way, multilingual neural machine translation , 2017, Comput. Speech Lang..

[3] Regina Barzilay,et al. Aspect-augmented Adversarial Networks for Domain Adaptation , 2017, TACL.

[4] Philipp Koehn,et al. Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[5] Alan Ritter,et al. Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[6] Jorge Civera,et al. Explicit length modelling for statistical machine translation , 2012, Pattern Recognit..

[7] Lantao Yu,et al. SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[8] Yoshua Bengio,et al. Context-dependent word representation for neural machine translation , 2016, Comput. Speech Lang..

[9] Pascual Martínez-Gómez,et al. Online adaptation strategies for statistical machine translation in post-editing scenarios , 2012, Pattern Recognit..

[10] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11] Francisco Casacuberta,et al. Improving on-line handwritten recognition in interactive machine translation , 2014, Pattern Recognit..

[12] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[13] Richard Socher,et al. MetaMind Neural Machine Translation System for WMT 2016 , 2016, WMT.

[14] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[15] Kevin Lin,et al. Adversarial Ranking for Language Generation , 2017, NIPS.

[16] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[17] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18] Yoshua Bengio,et al. Professor Forcing: A New Algorithm for Training Recurrent Networks , 2016, NIPS.

[19] Marta R. Costa-jussà,et al. A deep source-context feature for lexical selection in statistical machine translation , 2016, Pattern Recognit. Lett..

[20] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[21] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[22] Rico Sennrich,et al. Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[23] Colin Cherry,et al. A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU , 2014, WMT@ACL.

[24] Mirjam Sepesy Maucec,et al. Statistical machine translation of subtitles for highly inflected language pair , 2014, Pattern Recognit. Lett..

[25] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[26] Niladri Chatterjee,et al. Some Improvements over the BLEU Metric for Measuring Translation Quality for Hindi , 2007, 2007 International Conference on Computing: Theory and Applications (ICCTA'07).

[27] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[28] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.