VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation

Existing work in multilingual pretraining has demonstrated the potential of cross-lingual transferability by training a unified Transformer encoder for multiple languages. However, much of this work only relies on the shared vocabulary and bilingual contexts to encourage the correlation across languages, which is loose and implicit for aligning the contextual representations between languages. In this paper, we plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages. It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language. More importantly, when fine-tuning on downstream tasks, the cross-attention module can be plugged in or out on-demand, thus naturally benefiting a wider range of cross-lingual tasks, from language understanding to generation. As a result, the proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark, covering text classification, sequence labeling, question answering, and sentence retrieval. For cross-lingual generation tasks, it also outperforms all existing cross-lingual models and state-of-the-art Transformer variants on WMT14 English-to-German and English-to-French translation datasets, with gains of up to 1 2 BLEU.

[1]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Xiao Pan,et al.  Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information , 2020, EMNLP.

[3]  Zhe Gan,et al.  FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding , 2020, AAAI.

[4]  Kevin Duh,et al.  Very Deep Transformers for Neural Machine Translation , 2020, ArXiv.

[5]  Ming Zhou,et al.  InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2020, NAACL.

[6]  Xiaodong Fan,et al.  XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation , 2020, EMNLP.

[7]  Orhan Firat,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[8]  Garrison W. Cottrell,et al.  ReZero is All You Need: Fast Convergence at Large Depth , 2020, UAI.

[9]  Eunsol Choi,et al.  TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages , 2020, Transactions of the Association for Computational Linguistics.

[10]  Marjan Ghazvininejad,et al.  Multilingual Denoising Pre-training for Neural Machine Translation , 2020, Transactions of the Association for Computational Linguistics.

[11]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[12]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[13]  Mikel Artetxe,et al.  On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[14]  Holger Schwenk,et al.  MLQA: Evaluating Cross-lingual Extractive Question Answering , 2019, ACL.

[15]  Li Dong,et al.  Cross-Lingual Natural Language Generation via Pre-Training , 2019, AAAI.

[16]  Ming Zhou,et al.  Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks , 2019, EMNLP.

[17]  Orhan Firat,et al.  Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation , 2019, AAAI Conference on Artificial Intelligence.

[18]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[19]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[20]  Holger Schwenk,et al.  Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond , 2018, Transactions of the Association for Computational Linguistics.

[21]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[22]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[23]  Matt Post,et al.  A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[24]  Pierre Zweigenbaum,et al.  Overview of the Second BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora , 2017, BUCC@ACL.

[25]  Yuji Matsumoto,et al.  Universal Dependencies 2.1 , 2017 .

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.