论文信息 - mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs - 字舞流文

mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs

Multilingual T5 pretrains a sequence-to-sequence model on massive monolingual texts, which has shown promising results on many cross-lingual tasks. In this paper, we improve multilingual text-to-text transfer Transformer with translation pairs (mT6). Specifically, we explore three cross-lingual text-to-text pre-training tasks, namely, machine translation, translation pair span corruption, and translation span corruption. In addition, we propose a partially non-autoregressive objective for text-to-text pre-training. We evaluate the methods on seven multilingual benchmark datasets, including sentence classification, named entity recognition, question answering, and abstractive summarization. Experimental results show that the proposed mT6 improves cross-lingual transferability over mT5.

Furu Wei | Heyan Huang | Zewen Chi | Li Dong | Li Dong | Shuming Ma | Shaohan Huang Xian-Ling Mao | Furu Wei | Shuming Ma | Saksham Singhal | Heyan Huang | Zewen Chi

[1] Li Dong,et al. XLM-E: Cross-lingual Language Model Pre-training via ELECTRA , 2021, ACL.

[2] Furu Wei,et al. DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders , 2021, ArXiv.

[3] Furu Wei,et al. Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment , 2021, ACL.

[4] Hua Wu,et al. ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora , 2020, EMNLP.

[5] Furu Wei,et al. XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders , 2020, ArXiv.

[6] Luo Si,et al. VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation , 2020, ArXiv.

[7] Colin Raffel,et al. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.

[8] Graham Neubig,et al. Explicit Alignment Objectives for Multilingual Bidirectional Encoders , 2020, NAACL.

[9] Claire Cardie,et al. WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization , 2020, FINDINGS.

[10] Ming Zhou,et al. InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2020, NAACL.

[11] Masoud Jalili Sabet,et al. SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings , 2020, FINDINGS.

[12] Shuangzhi Wu,et al. Alternating Language Modeling for Cross-Lingual Pre-Training , 2020, AAAI.

[13] Orhan Firat,et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[14] Eunsol Choi,et al. TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.

[15] Jianfeng Gao,et al. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.

[16] Dan Klein,et al. Multilingual Alignment of Contextual Word Representations , 2020, ICLR.

[17] Marjan Ghazvininejad,et al. Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[18] Peter J. Liu,et al. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.

[19] Dan Roth,et al. Cross-Lingual Ability of Multilingual BERT: An Empirical Study , 2019, ICLR.

[20] Myle Ott,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[21] Vishrav Chaudhary,et al. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.

[22] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[23] Mikel Artetxe,et al. On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[24] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[25] Holger Schwenk,et al. MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[26] Li Dong,et al. Cross-Lingual Natural Language Generation via Pre-Training , 2019, AAAI.

[27] Ming Zhou,et al. Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.

[28] Jason Baldridge,et al. PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification , 2019, EMNLP.

[29] Holger Schwenk,et al. WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia , 2019, EACL.

[30] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[31] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[32] Mark Dredze,et al. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.

[33] Trevor Cohn,et al. Massively Multilingual Transfer for NER , 2019, ACL.

[34] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[35] Holger Schwenk,et al. Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[36] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[37] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[38] Pushpak Bhattacharyya,et al. The IIT Bombay English-Hindi Parallel Corpus , 2017, LREC.

[39] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[40] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[41] Marcin Junczys-Dowmunt,et al. The United Nations Parallel Corpus v1.0 , 2016, LREC.

[42] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43] Jörg Tiedemann,et al. Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[44] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[45] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[46] Heng Ji,et al. Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.