Uncertainty-Aware Balancing for Multilingual and Multi-Domain Neural Machine Translation Training

Learning multilingual and multi-domain translation model is challenging as the heterogeneous and imbalanced data make the model converge inconsistently over different corpora in real world. One common practice is to adjust the share of each corpus in the training, so that the learning process is balanced and low-resource cases can benefit from the highresource ones. However, automatic balancing methods usually depend on the intraand interdataset characteristics, which is usually agnostic or requires human priors. In this work, we propose an approach, MULTIUAT, that dynamically adjusts the training data usage based on the model’s uncertainty on a small set of trusted clean data for multi-corpus machine translation. We experiment with two classes of uncertainty measures on multilingual (16 languages with 4 settings) and multi-domain settings (4 for in-domain and 2 for out-of-domain on English-German translation) and demonstrate our approach MULTIUAT substantially outperforms its baselines, including both static and dynamic strategies. We analyze the crossdomain transfer and show the deficiency of static and similarity based methods.1

[1]  Orhan Firat,et al.  Massively Multilingual Neural Machine Translation , 2019, NAACL.

[2]  Markus Freitag,et al.  Complete Multilingual Neural Machine Translation , 2020, WMT.

[3]  Xinyi Wang,et al.  Optimizing Data Usage via Differentiable Rewards , 2019, ICML.

[4]  Yonatan Belinkov,et al.  Neural Machine Translation Training in a Multi-Domain Scenario , 2017, IWSLT.

[5]  Gholamreza Haffari,et al.  Multilingual Simultaneous Neural Machine Translation , 2021, FINDINGS.

[6]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[7]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[8]  Gholamreza Haffari,et al.  Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection , 2021, EMNLP.

[9]  Timothy Baldwin,et al.  What’s in a Domain? Learning Domain-Robust Text Representations using Adversarial Training , 2018, NAACL.

[10]  Huanbo Luan,et al.  Improving Back-Translation with Uncertainty-based Confidence Estimation , 2019, EMNLP.

[11]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[12]  Mark J. F. Gales,et al.  Uncertainty Estimation in Autoregressive Structured Prediction , 2021, ICLR.

[13]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[14]  Gholamreza Haffari,et al.  Adaptively Scheduled Multitask Learning: The Case of Low-Resource Neural Machine Translation , 2019, EMNLP.

[15]  Ankur Bapna,et al.  Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.

[16]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[17]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[18]  Xiangyu Duan,et al.  Factorized Transformer for Multi-Domain Neural Machine Translation , 2020, FINDINGS.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[21]  Graham Neubig,et al.  Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation , 2019, ACL.

[22]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[23]  Yang Liu,et al.  Multi-Domain Neural Machine Translation with Word-Level Domain Context Discrimination , 2018, EMNLP.

[24]  Patrice Marcotte,et al.  An overview of bilevel optimization , 2007, Ann. Oper. Res..

[25]  Mark Fishel,et al.  Multi-Domain Neural Machine Translation , 2018, EAMT.

[26]  Lucia Specia,et al.  Unsupervised Quality Estimation for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[27]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[28]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[29]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[30]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[31]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[32]  Graham Neubig,et al.  Balancing Training for Multilingual Neural Machine Translation , 2020, ACL.

[33]  Yarin Gal,et al.  Wat zei je? Detecting Out-of-Distribution Translations with Variational Transformers , 2020, ArXiv.

[34]  William Yang Wang,et al.  Quantifying Uncertainties in Natural Language Processing Tasks , 2018, AAAI.

[35]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[36]  Mirella Lapata,et al.  Confidence Modeling for Neural Semantic Parsing , 2018, ACL.

[37]  Yuan Li,et al.  Learning how to Active Learn: A Deep Reinforcement Learning Approach , 2017, EMNLP.

[38]  Mark Dredze,et al.  Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.

[39]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[40]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[41]  Timothy Baldwin,et al.  Semi-supervised Stochastic Multi-Domain Learning using Variational Inference , 2019, ACL.

[42]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[43]  Heinrich von Stackelberg Market Structure and Equilibrium , 2010 .

[44]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[45]  Ahmed Abdelali,et al.  The AMARA Corpus: Building Parallel Language Resources for the Educational Domain , 2014, LREC.

[46]  Yulia Tsvetkov,et al.  Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models , 2020, ICLR.

[47]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[48]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.