论文信息 - VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation - 字舞流文

VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation

Recent studies about learning multilingual representations have achieved significant performance gains across a wide range of downstream cross-lingual tasks. They train either an encoder-only Transformer mainly for understanding tasks, or an encoder-decoder Transformer specifically for generation tasks, ignoring the correlation between the two tasks and frameworks. In contrast, this paper presents a variable encoder-decoder (VECO) pre-training approach to unify the two mainstreams in both model architectures and pre-training tasks. VECO splits the standard Transformer block into several sub-modules trained with both inner-sequence and cross-sequence masked language modeling, and correspondingly reorganizes certain sub-modules for understanding and generation tasks during inference. Such a workflow not only ensures to train the most streamlined parameters necessary for two kinds of tasks, but also enables them to boost each other via sharing common sub-modules. As a result, VECO delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark covering text classification, sequence labeling, question answering, and sentence retrieval. For generation tasks, VECO also outperforms all existing cross-lingual models and state-of-the-art Transformer variants on WMT14 English-to-German and English-to-French translation datasets, with gains of up to 1$\sim$2 BLEU.

Luo Si | Yijia Liu | Jiahao Liu | Fei Huang | Songfang Huang | Fuli Luo | Wei Wang | Bin Bi | Bin Bi | Fei Huang | Wei Wang | Songfang Huang | Jiahao Liu | Yijia Liu | Luo Si | Fuli Luo

[1] Vishrav Chaudhary,et al. CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data , 2019, LREC.

[2] Zhe Gan,et al. FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding , 2020, AAAI.

[3] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[4] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.

[5] Garrison W. Cottrell,et al. ReZero is All You Need: Fast Convergence at Large Depth , 2020, UAI.

[6] Michael Carbin,et al. The Lottery Ticket Hypothesis: Training Pruned Neural Networks , 2018, ArXiv.

[7] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[8] Fan Yang,et al. XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.

[9] Li Dong,et al. Cross-Lingual Natural Language Generation via Pre-Training , 2020, AAAI.

[10] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[11] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[12] Ming Zhou,et al. InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2020, NAACL.

[13] Shuangzhi Wu,et al. Alternating Language Modeling for Cross-Lingual Pre-Training , 2020, AAAI.

[14] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[15] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[16] Kevin Duh,et al. Very Deep Transformers for Neural Machine Translation , 2020, ArXiv.

[17] Graham Neubig,et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[18] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Marjan Ghazvininejad,et al. Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[20] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[21] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[22] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[23] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[24] Ming Zhou,et al. Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.

[25] Di He,et al. Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder , 2019, AAAI.

[26] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27] Chenliang Li,et al. PALM: Pre-training an Autoencoding&autoregressive Language Model for Context-conditioned Generation , 2020, EMNLP.