论文信息 - Multilingual Denoising Pre-training for Neural Machine Translation - 字舞流文

Multilingual Denoising Pre-training for Neural Machine Translation

This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective (Lewis et al., 2019). mBART is the first method for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, whereas previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine-tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task- specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show that it enables transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training.1

Marjan Ghazvininejad | Luke Zettlemoyer | Sergey Edunov | Naman Goyal | Xian Li | Jiatao Gu | Mike Lewis | Yinhan Liu | Yinhan Liu | Naman Goyal | M. Lewis | Luke Zettlemoyer | Sergey Edunov | Jiatao Gu | Marjan Ghazvininejad | Xian Li

[1] Kevin Knight,et al. Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation , 2019, ACL.

[2] Tie-Yan Liu,et al. Machine Translation With Weakly Paired Bilingual Documents , 2018 .

[3] James Henderson,et al. Document-Level Neural Machine Translation with Hierarchical Attention Networks , 2018, EMNLP.

[4] Guillaume Lample,et al. Unsupervised Machine Translation Using Monolingual Corpora Only , 2017, ICLR.

[5] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[6] Yong Wang,et al. Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations , 2019, ACL.

[7] Yang Liu,et al. A Teacher-Student Framework for Zero-Resource Neural Machine Translation , 2017, ACL.

[8] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[9] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[10] Ankur Bapna,et al. Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges , 2019, ArXiv.

[11] Guillaume Lample,et al. Phrase-Based & Neural Unsupervised Machine Translation , 2018, EMNLP.

[12] Yoshua Bengio,et al. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[13] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[14] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[15] Peng-Jen Chen,et al. Facebook AI’s WAT19 Myanmar-English Translation Task Submission , 2019, EMNLP.

[16] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[17] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[18] Orhan Firat,et al. Does Neural Machine Translation Benefit from Larger Context? , 2017, ArXiv.

[19] Orhan Firat,et al. Massively Multilingual Neural Machine Translation , 2019, NAACL.

[20] Andy Way,et al. Exploiting Cross-Sentence Context for Neural Machine Translation , 2017, EMNLP.

[21] Eva Schlinger,et al. How Multilingual is Multilingual BERT? , 2019, ACL.

[22] Claire Cardie,et al. Unsupervised Multilingual Word Embeddings , 2018, EMNLP.

[23] Tie-Yan Liu,et al. Incorporating BERT into Neural Machine Translation , 2020, ICLR.

[24] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[25] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[26] Victor O. K. Li,et al. Universal Neural Machine Translation for Extremely Low Resource Languages , 2018, NAACL.

[27] Mauro Cettolo,et al. WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[28] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[29] Rico Sennrich,et al. Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[30] Masao Utiyama,et al. Towards Burmese (Myanmar) Morphological Analysis , 2020, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[31] Jan Niehues,et al. The IWSLT 2015 Evaluation Campaign , 2015, IWSLT.

[32] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[33] Philipp Koehn,et al. The FLORES Evaluation Datasets for Low-Resource Machine Translation: Nepali–English and Sinhala–English , 2019, EMNLP.

[34] Xin Wang,et al. Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation , 2019, NAACL.

[35] Jörg Tiedemann,et al. Neural Machine Translation with Extended Context , 2017, DiscoMT@EMNLP.

[36] Quoc V. Le,et al. Unsupervised Pretraining for Sequence to Sequence Learning , 2016, EMNLP.

[37] Tomoharu Iwata,et al. Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models , 2018, ArXiv.

[38] Yang Liu,et al. Learning to Remember Translation History with a Continuous Cache , 2017, TACL.

[39] Dan Roth,et al. Cross-Lingual Ability of Multilingual BERT: An Empirical Study , 2019, ICLR.

[40] Lav R. Varshney,et al. CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[41] Mirella Lapata,et al. Text Summarization with Pretrained Encoders , 2019, EMNLP.

[42] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[43] Masao Utiyama,et al. NOVA , 2018, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[44] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[45] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[46] Jianfeng Gao,et al. DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.

[47] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[48] Sergey Edunov,et al. Pre-trained language model representations for language generation , 2019, NAACL.

[49] Eneko Agirre,et al. Unsupervised Neural Machine Translation , 2017, ICLR.

[50] Maria Leonor Pacheco,et al. of the Association for Computational Linguistics: , 2001 .

[51] Quoc V. Le,et al. Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[52] Guillaume Lample,et al. XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[53] Vishrav Chaudhary,et al. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.

[54] Guillaume Lample,et al. Word Translation Without Parallel Data , 2017, ICLR.

[55] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[56] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[57] Lei Li,et al. Towards Making the Most of BERT in Neural Machine Translation , 2020, AAAI.

[58] Rico Sennrich,et al. Improving Neural Machine Translation Models with Monolingual Data , 2015, ACL.

[59] Qun Liu,et al. Pretrained Language Models for Document-Level Neural Machine Translation , 2019, ArXiv.

[60] Mikel Artetxe,et al. On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.