Exploiting Monolingual Data at Scale for Neural Machine Translation

While target-side monolingual data has been proven to be very useful to improve neural machine translation (briefly, NMT) through back translation, source-side monolingual data is not well investigated. In this work, we study how to use both the source-side and target-side monolingual data for NMT, and propose an effective strategy leveraging both of them. First, we generate synthetic bitext by translating monolingual data from the two domains into the other domain using the models pretrained on genuine bitext. Next, a model is trained on a noised version of the concatenated synthetic bitext where each source sequence is randomly corrupted. Finally, the model is fine-tuned on the genuine bitext and a clean version of a subset of the synthetic bitext without adding any noise. Our approach achieves state-of-the-art results on WMT16, WMT17, WMT18 English↔German translations and WMT19 German→French translations, which demonstrate the effectiveness of our method. We also conduct a comprehensive study on how each part in the pipeline works.

[1]  Enhong Chen,et al.  Joint Training for Neural Machine Translation Models with Monolingual Data , 2018, AAAI.

[2]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3]  Adrià de Gispert,et al.  The University of Cambridge’s Machine Translation Systems for WMT18 , 2018, WMT.

[4]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[5]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[6]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[7]  Li Zhao,et al.  Dual Transfer Learning for Neural Machine Translation with Marginal Distribution Regularization , 2018, AAAI.

[8]  Li Zhao,et al.  Sequence Prediction with Unlabeled Data by Reward Function Learning , 2017, IJCAI.

[9]  Lijun Wu,et al.  A Study of Reinforcement Learning for Neural Machine Translation , 2018, EMNLP.

[10]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[11]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[12]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[13]  Ryan Cotterell,et al.  Explaining and Generalizing Back-Translation through Wake-Sleep , 2018, ArXiv.

[14]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[15]  Maosong Sun,et al.  Semi-Supervised Learning for Neural Machine Translation , 2016, ACL.

[16]  Hang Li,et al.  “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[17]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[18]  Myle Ott,et al.  Understanding Back-Translation at Scale , 2018, EMNLP.

[19]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[20]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[21]  Andy Way,et al.  Investigating Backtranslation in Neural Machine Translation , 2018, EAMT.

[22]  Francisco Casacuberta,et al.  Adapting Neural Machine Translation with Parallel Synthetic Data , 2017, WMT.

[23]  Hermann Ney,et al.  The RWTH Aachen University English-German and German-English Unsupervised Neural Machine Translation Systems for WMT 2018 , 2018, WMT.

[24]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Gholamreza Haffari,et al.  Transductive learning for statistical machine translation , 2007, ACL.

[26]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Yoshua Bengio,et al.  On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[29]  Marcin Junczys-Dowmunt,et al.  Microsoft’s Submission to the WMT2018 News Translation Task: How I Learned to Stop Worrying and Love the Data , 2018, WMT.

[30]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[31]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[32]  Jiajun Zhang,et al.  Exploiting Source-side Monolingual Data in Neural Machine Translation , 2016, EMNLP.

[33]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[34]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[35]  François Yvon,et al.  Using Monolingual Data in Neural Machine Translation: a Systematic Study , 2018, WMT.

[36]  Tie-Yan Liu,et al.  Multi-Agent Dual Learning , 2019, ICLR.

[37]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[38]  Yoshua Bengio,et al.  On integrating a language model into neural machine translation , 2017, Comput. Speech Lang..

[39]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.