Learning a Multitask Curriculum for Neural Machine Translation

Existing curriculum learning research in neural machine translation (NMT) mostly focuses on a single final task such as selecting data for a domain or for denoising, and considers in-task example selection. This paper studies the data selection problem in multitask setting. We present a method to learn a multitask curriculum on a single, diverse, potentially noisy training dataset. It computes multiple data selection scores for each training example, each score measuring how useful the example is to a certain task. It uses Bayesian optimization to learn a linear weighting of these per-instance scores, and then sorts the data to form a curriculum. We experiment with three domain translation tasks: two specific domains and the general domain, and demonstrate that the learned multitask curriculum delivers results close to individually optimized models and brings solid gains over no curriculum training, across all test sets.

[1]  Ciprian Chelba,et al.  Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation , 2019, ACL.

[2]  Taku Kudo,et al.  Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[3]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[4]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[5]  Kevin Duh,et al.  Curriculum Learning for Domain Adaptation in Neural Machine Translation , 2019, NAACL.

[6]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[7]  George F. Foster,et al.  Bilingual Methods for Adaptive Training Data Selection for Machine Translation , 2016, AMTA.

[8]  Yonatan Belinkov,et al.  Neural Machine Translation Training in a Multi-Domain Scenario , 2017, IWSLT.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  George F. Foster,et al.  Reinforcement Learning based Curriculum Optimization for Neural Machine Translation , 2019, NAACL.

[11]  Barnabás Póczos,et al.  Competence-based Curriculum Learning for Neural Machine Translation , 2019, NAACL.

[12]  Ray Kurzweil,et al.  Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax , 2019, IJCAI.

[13]  Yulia Tsvetkov,et al.  Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning , 2016, ACL.

[14]  Lemao Liu,et al.  Instance Weighting for Neural Machine Translation Domain Adaptation , 2017, EMNLP.

[15]  Marcin Junczys-Dowmunt,et al.  Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora , 2018, WMT.

[16]  Lutz Prechelt,et al.  Automatic early stopping using cross validation: quantifying the criteria , 1998, Neural Networks.

[17]  Christof Monz,et al.  Dynamic Data Selection for Neural Machine Translation , 2017, EMNLP.

[18]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[19]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[20]  Jianfeng Gao,et al.  Domain Adaptation via Pseudo In-Domain Data Selection , 2011, EMNLP.

[21]  Taro Watanabe,et al.  Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection , 2018, WMT.

[22]  Yoshua Bengio,et al.  An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[23]  Masao Utiyama,et al.  Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation , 2018, ACL.

[24]  Ondrej Bojar,et al.  Curriculum Learning and Minibatch Bucketing in Neural Machine Translation , 2017, RANLP.

[25]  Jungi Kim,et al.  Boosting Neural Machine Translation , 2016, IJCNLP.

[26]  Huda Khayrallah,et al.  An Empirical Exploration of Curriculum Learning for Neural Machine Translation , 2018, ArXiv.

[27]  Quoc V. Le,et al.  Effective Domain Mixing for Neural Machine Translation , 2017, WMT.

[28]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[29]  Barbara Plank,et al.  Learning to select data for transfer learning with Bayesian Optimization , 2017, EMNLP.

[30]  Marcello Federico,et al.  Multi-Domain Neural Machine Translation through Unsupervised Adaptation , 2017, WMT.

[31]  Fei Huang,et al.  Semi-supervised Convolutional Networks for Translation Adaptation with Tiny Amount of In-domain Data , 2016, CoNLL.

[32]  Ray Kurzweil,et al.  Learning Cross-Lingual Sentence Representations via a Multi-task Dual-Encoder Model , 2019, RepL4NLP@ACL.

[33]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.