Competence-based Curriculum Learning for Neural Machine Translation

Current state-of-the-art NMT systems use large neural networks that are not only slow to train, but also often require many heuristics and optimization tricks, such as specialized learning rate schedules and large batch sizes. This is undesirable as it requires extensive hyperparameter tuning. In this paper, we propose a curriculum learning framework for NMT that reduces training time, reduces the need for specialized heuristics or large batch sizes, and results in overall better performance. Our framework consists of a principled way of deciding which training samples are shown to the model at different times during training, based on the estimated difficulty of a sample and the current competence of the model. Filtering training samples in this manner prevents the model from getting stuck in bad local optima, making it converge faster and reach a better solution than the common approach of uniformly sampling training examples. Furthermore, the proposed method can be easily applied to existing NMT models by simply modifying their input data pipelines. We show that our framework can help improve the training time and the performance of both recurrent neural network models and Transformers, achieving up to a 70% decrease in training time, while at the same time obtaining accuracy improvements of up to 2.2 BLEU.

[1]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[2]  Bo Wang,et al.  SYSTRAN's Pure Neural Machine Translation Systems , 2016, ArXiv.

[3]  Huda Khayrallah,et al.  An Empirical Exploration of Curriculum Learning for Neural Machine Translation , 2018, ArXiv.

[4]  Ondrej Bojar,et al.  Training Tips for the Transformer Model , 2018, Prague Bull. Math. Linguistics.

[5]  Tom M. Mitchell,et al.  Contextual Parameter Generation for Universal Neural Machine Translation , 2018, EMNLP.

[6]  Frederick Liu,et al.  Handling Homographs in Neural Machine Translation , 2017, NAACL.

[7]  Chris Callison-Burch,et al.  Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation , 2010, ACL.

[8]  Ondrej Bojar,et al.  Results of the WMT17 Neural MT Training Task , 2017, WMT.

[9]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[10]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[11]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[12]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[13]  Noam Shazeer,et al.  Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.

[14]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[15]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[16]  Jaime G. Carbonell,et al.  Active learning and crowdsourcing for machine translation in low resource scenarios , 2012 .

[17]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[18]  Kai A. Krueger,et al.  Flexible shaping: How learning in small steps helps , 2009, Cognition.

[19]  Gholamreza Haffari,et al.  Active Learning for Statistical Phrase-based Machine Translation , 2009, NAACL.

[20]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[21]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[22]  Philipp Koehn,et al.  Findings of the 2017 Conference on Machine Translation (WMT17) , 2017, WMT.

[23]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[24]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[25]  Rico Sennrich,et al.  Evaluating Discourse Phenomena in Neural Machine Translation , 2017, NAACL.

[26]  Ondrej Bojar,et al.  Curriculum Learning and Minibatch Bucketing in Neural Machine Translation , 2017, RANLP.

[27]  Jungi Kim,et al.  Boosting Neural Machine Translation , 2016, IJCNLP.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.