Understanding Back-Translation at Scale

An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences. This work broadens the understanding of back-translation and investigates a number of methods to generate synthetic source sentences. We find that in all but resource poor settings back-translations obtained via sampling or noised beam outputs are most effective. Our analysis shows that sampling or noisy synthetic data gives a much stronger training signal than data generated by beam or greedy search. We also compare how synthetic data compares to genuine bitext and study various domain effects. Finally, we scale to hundreds of millions of monolingual sentences and achieve a new state of the art of 35 BLEU on the WMT’14 English-German test set.

[1]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[2]  Jiajun Zhang,et al.  Exploiting Source-side Monolingual Data in Neural Machine Translation , 2016, EMNLP.

[3]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[4]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[6]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[7]  Alex Graves,et al.  Neural Machine Translation in Linear Time , 2016, ArXiv.

[8]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[9]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[10]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[11]  Jan Niehues,et al.  Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[12]  Yaser Al-Onaizan,et al.  Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.

[13]  Myle Ott,et al.  Scaling Neural Machine Translation , 2018, WMT.

[14]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[15]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[16]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[17]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[18]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[19]  Gholamreza Haffari,et al.  Iterative Back-Translation for Neural Machine Translation , 2018, NMT@ACL.

[20]  Atsushi Fujita,et al.  Enhancement of Encoder and Attention Using Target Monolingual Corpora in Neural Machine Translation , 2018, NMT@ACL.

[21]  Marc'Aurelio Ranzato,et al.  Analyzing Uncertainty in Neural Machine Translation , 2018, ICML.

[22]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[23]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[24]  Sergei Nirenburg,et al.  A Statistical Approach to Machine Translation , 2003 .

[25]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Lukasz Kaiser,et al.  Depthwise Separable Convolutions for Neural Machine Translation , 2017, ICLR.

[27]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[28]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[29]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[30]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[31]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[32]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[33]  Yoshua Bengio,et al.  On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.

[34]  Holger Schwenk,et al.  Investigations on Translation Model Adaptation Using Monolingual Data , 2011, WMT@EMNLP.

[35]  Hua Wu,et al.  Improved Neural Machine Translation with SMT Features , 2016, AAAI.

[36]  Philipp Koehn,et al.  Scalable Modified Kneser-Ney Language Model Estimation , 2013, ACL.

[37]  Marcello Federico,et al.  Domain Adaptation for Statistical Machine Translation with Monolingual Resources , 2009, WMT@EACL.

[38]  Ondrej Bojar,et al.  Improving Translation Model by Monolingual Data , 2011, WMT@EMNLP.

[39]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[40]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[41]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[42]  Josef van Genabith,et al.  Neural machine translation for low-resource languages without parallel corpora , 2017, Machine Translation.

[43]  John W. Fisher,et al.  Dreaming More Data: Class-dependent Distributions over Diffeomorphisms for Learned Data Augmentation , 2015, AISTATS.

[44]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[45]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[46]  Andy Way,et al.  Investigating Backtranslation in Neural Machine Translation , 2018, EAMT.

[47]  Marine Carpuat,et al.  Bi-Directional Neural Machine Translation with Synthetic Parallel Data , 2018, NMT@ACL.

[48]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[49]  Philipp Koehn,et al.  Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.

[50]  Nenghai Yu,et al.  Dual Supervised Learning , 2017, ICML.

[51]  Kenneth Heafield,et al.  Copied Monolingual Data Improves Low-Resource Neural Machine Translation , 2017, WMT.

[52]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[53]  Yoshua Bengio,et al.  Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[54]  Amos J. Storkey,et al.  Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[55]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[56]  Ryan Cotterell,et al.  Explaining and Generalizing Back-Translation through Wake-Sleep , 2018, ArXiv.

[57]  Victor O. K. Li,et al.  Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.

[58]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[59]  Yoshua Bengio,et al.  On integrating a language model into neural machine translation , 2017, Comput. Speech Lang..

[60]  Richard Socher,et al.  Weighted Transformer Network for Machine Translation , 2017, ArXiv.

[61]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.