Revisiting Low-Resource Neural Machine Translation: A Case Study

It has been shown that the performance of neural machine translation (NMT) drops starkly in low-resource conditions, underperforming phrase-based statistical machine translation (PBSMT) and requiring large amounts of auxiliary data to achieve competitive results. In this paper, we re-assess the validity of these results, arguing that they are the result of lack of system adaptation to low-resource settings. We discuss some pitfalls to be aware of when training low-resource NMT systems, and recent techniques that have shown to be especially helpful in low-resource settings, resulting in a set of best practices for low-resource NMT. In our experiments on German--English with different amounts of IWSLT14 training data, we show that, without the use of any auxiliary monolingual or multilingual data, an optimized NMT system can outperform PBSMT with far less data than previously claimed. We also apply these techniques to a low-resource Korean-English dataset, surpassing previously reported results by 4 BLEU.

[1]  Jörg Tiedemann,et al.  Neural machine translation for low-resource languages , 2017, ArXiv.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[4]  Victor O. K. Li,et al.  Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.

[5]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[6]  Chong Wang,et al.  Towards Neural Phrase-based Machine Translation , 2017, ICLR.

[7]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Eneko Agirre,et al.  Unsupervised Statistical Machine Translation , 2018, EMNLP.

[9]  Yong Wang,et al.  Meta-Learning for Low-Resource Neural Machine Translation , 2018, EMNLP.

[10]  David Chiang,et al.  Improving Lexical Choice in Neural Machine Translation , 2017, NAACL.

[11]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[12]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[13]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[14]  Ankur Bapna,et al.  Revisiting Character-Based Neural Machine Translation with Capacity and Compression , 2018, EMNLP.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Anders Søgaard,et al.  On the Limitations of Unsupervised Bilingual Dictionary Induction , 2018, ACL.

[17]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[18]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[19]  Marc'Aurelio Ranzato,et al.  Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.

[20]  Tie-Yan Liu,et al.  Dual Learning for Machine Translation , 2016, NIPS.

[21]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[22]  Quoc V. Le,et al.  Unsupervised Pretraining for Sequence to Sequence Learning , 2016, EMNLP.

[23]  Ondrej Bojar,et al.  Trivial Transfer Learning for Low-Resource Neural Machine Translation , 2018, WMT.

[24]  Maosong Sun,et al.  Semi-Supervised Learning for Neural Machine Translation , 2016, ACL.

[25]  Rico Sennrich,et al.  The University of Edinburgh’s Neural MT Systems for WMT17 , 2017, WMT.

[26]  David Chiang,et al.  Transfer Learning across Low-Resource, Related Languages for Neural Machine Translation , 2017, IJCNLP.

[27]  Kenneth Heafield,et al.  Copied Monolingual Data Improves Low-Resource Neural Machine Translation , 2017, WMT.

[28]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[29]  Eneko Agirre,et al.  Unsupervised Neural Machine Translation , 2017, ICLR.

[30]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[31]  Rico Sennrich,et al.  Deep architectures for Neural Machine Translation , 2017, WMT.

[32]  Marcello Federico,et al.  Report on the 11th IWSLT evaluation campaign , 2014, IWSLT.

[33]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[34]  Laurent Besacier,et al.  Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction , 2018, CoNLL.

[35]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[36]  Masashi Toyoda,et al.  A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size , 2017, WAT@IJCNLP.

[37]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[38]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[39]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[40]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[41]  Yang Liu,et al.  A Teacher-Student Framework for Zero-Resource Neural Machine Translation , 2017, ACL.

[42]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[43]  Alexander M. Rush,et al.  Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[44]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[45]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[46]  Graham Neubig,et al.  Rapid Adaptation of Neural Machine Translation to New Languages , 2018, EMNLP.

[47]  Yann Dauphin,et al.  Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.

[48]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[49]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.