论文信息 - XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders - 字舞流文

XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

Multilingual machine translation enables a single model to translate between different languages. Most existing multilingual machine translation systems adopt a randomly initialized Transformer backbone. In this work, inspired by the recent success of language model pre-training, we present XLM-T, which initializes the model with an off-the-shelf pretrained cross-lingual Transformer encoder and finetunes it with multilingual parallel data. This simple method achieves significant improvements on a WMT dataset with 10 language pairs and the OPUS-100 corpus with 94 pairs. Surprisingly, the method is also effective even upon the strong baseline with back-translation. Moreover, extensive analysis of XLM-T on unsupervised syntactic parsing, word alignment, and multilingual classification explains its effectiveness for machine translation.1

Furu Wei | Alexandre Muzio | Shuming Ma | Jian Yang | Haoyang Huang | Zewen Chi | Li Dong | Dongdong Zhang | Hany Hassan Awadalla | Akiko Eriguchi | Saksham Singhal | Xia Song | Arul Menezes | Li Dong | Shuming Ma | Arul Menezes | Furu Wei | Dongdong Zhang | Akiko Eriguchi | Haoyang Huang | Alexandre Muzio | Saksham Singhal | Xia Song | Zewen Chi | Jian Yang | Jian Yang | H. Awadalla

[1] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[2] Colin Raffel,et al. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.

[3] Orhan Firat,et al. Massively Multilingual Neural Machine Translation , 2019, NAACL.

[4] Tom M. Mitchell,et al. Contextual Parameter Generation for Universal Neural Machine Translation , 2018, EMNLP.

[5] Ming Zhou,et al. InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2020, NAACL.

[6] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[7] Pushpak Bhattacharyya,et al. Multilingual Unsupervised NMT using Shared Encoder and Language-Specific Decoders , 2019, ACL.

[8] Kevin Gimpel,et al. Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.

[9] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[10] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[11] Yuqing Tang,et al. Multilingual Translation with Extensible Multilingual Pretraining and Finetuning , 2020, ArXiv.

[12] Li Dong,et al. Cross-Lingual Natural Language Generation via Pre-Training , 2020, AAAI.

[13] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[14] Jianfeng Gao,et al. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.

[15] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[16] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.

[17] Hinrich Schutze,et al. SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings , 2020, EMNLP.

[18] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[19] Victor O. K. Li,et al. Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.

[20] Yoshua Bengio,et al. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[21] Jungo Kasai,et al. Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation , 2020, ArXiv.

[22] Jan Niehues,et al. Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder , 2016, IWSLT.

[23] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[24] Yaser Al-Onaizan,et al. Zero-Resource Translation with Multi-Lingual Neural Machine Translation , 2016, EMNLP.

[25] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[26] Deniz Yuret,et al. Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[27] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.

[28] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[29] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[30] Vishrav Chaudhary,et al. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.

[31] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[32] Jörg Tiedemann,et al. An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.

[33] Holger Schwenk,et al. Beyond English-Centric Multilingual Machine Translation , 2020, J. Mach. Learn. Res..

[34] Furu Wei,et al. Visualizing and Understanding the Effectiveness of BERT , 2019, EMNLP.

[35] Xiao Pan,et al. Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information , 2020, EMNLP.

[36] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37] Marjan Ghazvininejad,et al. Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[38] ChengXiang Zhai,et al. Multi-task Learning for Multilingual Neural Machine Translation , 2020, EMNLP.

[39] Miguel Ballesteros,et al. Multilingual Neural Machine Translation with Task-Specific Attention , 2018, COLING.

[40] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[41] Rico Sennrich,et al. Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation , 2020, ACL.