Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task

This report describes Microsoft’s machine translation systems for the WMT21 shared task on large-scale multilingual machine translation. We participated in all three evaluation tracks including Large Track and two Small Tracks where the former one is unconstrained and the latter two are fully constrained. Our model submissions to the shared task were initialized with DeltaLM, a generic pre-trained multilingual encoder-decoder model, and fine-tuned correspondingly with the vast collected parallel data and allowed data sources according to track settings, together with applying progressive learning and iterative back-translation approaches to further improve the performance. Our final submissions ranked first on three tracks in terms of the automatic evaluation metric.

[1]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[2]  Dianhai Yu,et al.  Multi-Task Learning for Multiple Language Translation , 2015, ACL.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Gholamreza Haffari,et al.  Iterative Back-Translation for Neural Machine Translation , 2018, NMT@ACL.

[5]  Feifei Zhai,et al.  Three Strategies to Improve One-to-Many Multilingual Translation , 2018, EMNLP.

[6]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[7]  Di He,et al.  Multilingual Neural Machine Translation with Knowledge Distillation , 2019, ICLR.

[8]  Hermann Ney,et al.  Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages , 2019, EMNLP.

[9]  Orhan Firat,et al.  Massively Multilingual Neural Machine Translation , 2019, NAACL.

[10]  Graham Neubig,et al.  Dynamic Data Selection and Weighting for Iterative Back-Translation , 2020, EMNLP.

[11]  Philipp Koehn,et al.  A Massive Collection of Cross-Lingual Web-Document Pairs , 2019, EMNLP.

[12]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[13]  Jingbo Zhu,et al.  Shallow-to-Deep Training for Neural Machine Translation , 2020, EMNLP.

[14]  Yulia Tsvetkov,et al.  On Negative Interference in Multilingual Language Models , 2020, EMNLP.

[15]  Yulia Tsvetkov,et al.  Balancing Training for Multilingual Neural Machine Translation , 2020, ACL.

[16]  Rico Sennrich,et al.  Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation , 2020, ACL.

[17]  Matthias Gallé,et al.  Monolingual Adapters for Zero-Shot Neural Machine Translation , 2020, EMNLP.

[18]  ChengXiang Zhai,et al.  Multi-task Learning for Multilingual Neural Machine Translation , 2020, EMNLP.

[19]  Chenhui Chu,et al.  A Survey of Multilingual Neural Machine Translation , 2019, ACM Comput. Surv..

[20]  Furu Wei,et al.  DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders , 2021, ArXiv.

[21]  Pengxiang Wu,et al.  Learning with Feature-Dependent Label Noise: A Progressive Approach , 2021, ICLR.

[22]  Dmitriy Genzel,et al.  Adaptive Sparse Transformer for Multilingual Translation , 2021, ArXiv.

[23]  Holger Schwenk,et al.  CCMatrix: Mining Billions of High-Quality Parallel Sentences on the Web , 2019, ACL.

[24]  Xiaojie Yuan,et al.  An End-to-End Progressive Multi-Task Learning Framework for Medical Named Entity Recognition and Normalization , 2021, ACL.

[25]  Mingxuan Wang,et al.  Contrastive Learning for Many-to-many Multilingual Neural Machine Translation , 2021, ACL.

[26]  Holger Schwenk,et al.  Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..

[27]  Mingxuan Wang,et al.  Learning Language Specific Sub-network for Multilingual Machine Translation , 2021, ACL.

[28]  Marc'Aurelio Ranzato,et al.  The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation , 2021, TACL.

[29]  Ankur Bapna,et al.  Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation , 2021, ICLR.