论文信息 - MAD-G: Multilingual Adapter Generation for Efficient Cross-Lingual Transfer - 字舞流文

MAD-G: Multilingual Adapter Generation for Efficient Cross-Lingual Transfer

Adapter modules have emerged as a general parameter-efficient means to specialize a pretrained encoder to new domains. Massively multilingual transformers (MMTs) have particularly benefited from additional training of language-specific adapters. However, this approach is not viable for the vast majority of languages, due to limitations in their corpus size or compute budgets. In this work, we propose MAD-G (Multilingual ADapter Generation), which contextually generates language adapters from language representations based on typological features. In contrast to prior work, our timeand space-efficient MAD-G approach enables (1) sharing of linguistic knowledge across languages and (2) zero-shot inference by generating language adapters for unseen languages. We thoroughly evaluate MAD-G in zero-shot crosslingual transfer on part-of-speech tagging, dependency parsing, and named entity recognition. While offering (1) improved fine-tuning efficiency (by a factor of around 50 in our experiments), (2) a smaller parameter budget, and (3) increased language coverage, MADG remains competitive with more expensive methods for language-specific adapter training across the board. Moreover, it offers substantial benefits for low-resource languages, particularly on the NER task in low-resource African languages. Finally, we demonstrate that MAD-G’s transfer performance can be further improved via: (i) multi-source training, i.e., by generating and combining adapters of multiple languages with available taskspecific training data; and (ii) by further finetuning generated MAD-G adapters for languages with monolingual data.

Goran Glavas | Anna Korhonen | Edoardo Maria Ponti | Ivan Vulic | Jonas Pfeiffer | Alan Ansell | Sebastian Ruder | A. Korhonen | Goran Glavas | Ivan Vulic | Jonas Pfeiffer | E. Ponti | Sebastian Ruder | Alan Ansell

[1] Quoc V. Le,et al. Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[2] Iryna Gurevych,et al. AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.

[3] Iryna Gurevych,et al. How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models , 2020, ACL.

[4] Monojit Choudhury,et al. The State and Fate of Linguistic Diversity and Inclusion in the NLP World , 2020, ACL.

[5] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[6] Iryna Gurevych,et al. UNKs Everywhere: Adapting Multilingual Language Models to New Scripts , 2020, EMNLP.

[7] Ryan Cotterell,et al. Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages , 2020, Transactions of the Association for Computational Linguistics.

[8] Ryan Cotterell,et al. Towards Zero-shot Language Modeling , 2019, EMNLP.

[9] Ankur Bapna,et al. Non-Parametric Adaptation for Neural Machine Translation , 2019, NAACL.

[10] Iryna Gurevych,et al. MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.

[11] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[12] Mark Dredze,et al. Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT , 2019, EMNLP.

[13] Anders Sogaard,et al. Minimax and Neyman–Pearson Meta-Learning for Outlier Languages , 2021, FINDINGS.

[14] Gertjan van Noord,et al. UDapter: Language Adaptation for Truly Universal Dependency Parsing , 2020, EMNLP.

[15] Quoc V. Le,et al. HyperNetworks , 2016, ICLR.

[16] Goran Glavas,et al. Is Supervised Syntactic Parsing Beneficial for Language Understanding Tasks? An Empirical Investigation , 2020, EACL.

[17] Myle Ott,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[18] Colin Raffel,et al. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2020, NAACL.

[19] Goran Glavaš,et al. From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers , 2020, EMNLP.

[20] Timothy Dozat,et al. Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[21] Iryna Gurevych,et al. AdapterFusion: Non-Destructive Task Composition for Transfer Learning , 2020, EACL.

[22] Mikel Artetxe,et al. On the Cross-lingual Transferability of Monolingual Representations , 2019, ACL.

[23] Yoshua Bengio,et al. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism , 2016, NAACL.

[24] Tom M. Mitchell,et al. Contextual Parameter Generation for Universal Neural Machine Translation , 2018, EMNLP.

[25] Orhan Firat,et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[26] Iryna Gurevych,et al. AdapterDrop: On the Efficiency of Adapters in Transformers , 2020, EMNLP.

[27] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[28] Thierry Poibeau,et al. Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing , 2018, Computational Linguistics.

[29] Bartunov Sergey,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[30] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[31] Quoc V. Le,et al. Multi-task Sequence to Sequence Learning , 2015, ICLR.

[32] Andrea Vedaldi,et al. Learning multiple visual domains with residual adapters , 2017, NIPS.

[33] A. Korhonen,et al. XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning , 2020, EMNLP.

[34] Eneko Agirre,et al. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.

[35] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36] Patrick Littell,et al. URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors , 2017, EACL.

[37] Goran Glavas,et al. How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions , 2019, ACL.