Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation

This paper demonstrates that multilingual pretraining, a proper fine-tuning method and a large-scale parallel dataset from multiple auxiliary languages are all critical for zeroshot translation, where the NMT model is tested on source languages unseen during supervised training. Following this idea, we present SixT++, a strong many-to-English NMT model that supports 100 source languages but is trained once with a parallel dataset from only six source languages. SixT++ initializes the decoder embedding and the full encoder with XLM-R large, and then trains the encoder and decoder layers with a simple two-stage training strategy. SixT++ achieves impressive performance on many-toEnglish translation. It significantly outperforms CRISS and m2m-100, two strong multilingual NMT systems, with an average gain of 7.2 and 5.0 BLEU respectively. Additionally, SixT++ offers a set of model parameters that can be further fine-tuned to develop unsupervised NMT models for low-resource languages. With back-translation on monolingual data of low-resource language, it outperforms all current state-of-the-art unsupervised methods on Nepali and Sinhal for both translating into and from English.

[1]  Ankur Bapna,et al.  The Missing Ingredient in Zero-Shot Neural Machine Translation , 2019, ArXiv.

[2]  Rico Sennrich,et al.  Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation , 2020, ACL.

[3]  Fan Yang,et al.  XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.

[4]  Jan Niehues,et al.  Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[5]  Yong Wang,et al.  Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations , 2019, ACL.

[6]  Philipp Koehn,et al.  Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English , 2019, ArXiv.

[7]  Furu Wei,et al.  Zero-Shot Cross-Lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders , 2021, EMNLP.

[8]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  D. Traum,et al.  Revisiting Low-Resource Neural Machine Translation: A Case Study , 2019, ACL.

[11]  Yichao Lu,et al.  A neural interlingua for multilingual machine translation , 2018, WMT.

[12]  Guillaume Lample,et al.  Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[13]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[14]  Mark Dredze,et al.  Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.

[15]  Pascale Fung,et al.  Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data , 2021, ACL.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Rico Sennrich,et al.  Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[18]  Weihua Luo,et al.  On Learning Universal Representations Across Languages , 2021, ICLR.

[19]  Hai Zhao,et al.  Unsupervised Neural Machine Translation with Indirect Supervision , 2020, ArXiv.

[20]  Nan Duan,et al.  ProphetNet-X: Large-Scale Pre-training Models for English, Chinese, Multi-lingual, Dialog, and Code Generation , 2021, ArXiv.

[21]  Jan Niehues,et al.  Improving Zero-Shot Translation by Disentangling Positional Information , 2021, ACL/IJCNLP.

[22]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[23]  Yuqing Tang,et al.  Cross-lingual Retrieval for Iterative Self-Supervised Training , 2020, NeurIPS.

[24]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[25]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[26]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[27]  Yuqing Tang,et al.  Multilingual Translation with Extensible Multilingual Pretraining and Finetuning , 2020, ArXiv.

[28]  Min Zhang,et al.  Cross-lingual Pre-training Based Transfer for Zero-shot Neural Machine Translation , 2019, AAAI.

[29]  Marc'Aurelio Ranzato,et al.  The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation , 2021, TACL.

[30]  Holger Schwenk,et al.  Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..

[31]  Taku Kudo,et al.  Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[32]  Orhan Firat,et al.  Harnessing Multilinguality in Unsupervised Machine Translation for Rare Languages , 2020, NAACL.

[33]  Guillaume Lample,et al.  Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[34]  Colin Raffel,et al.  mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2021, NAACL.

[35]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[36]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[37]  Alex Waibel,et al.  Improving Zero-shot Translation with Language-Independent Constraints , 2019, WMT.

[38]  Graham Neubig,et al.  Generalized Data Augmentation for Low-Resource Translation , 2019, ACL.

[39]  Mingxuan Wang,et al.  Contrastive Learning for Many-to-many Multilingual Neural Machine Translation , 2021, ACL.

[40]  Ankur P. Parikh,et al.  Consistency by Agreement in Zero-Shot Neural Machine Translation , 2019, NAACL.

[41]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.