Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual Transfer

Adapter modules, additional trainable parameters that enable efficient fine-tuning of pretrained transformers, have recently been used for language specialization of multilingual transformers, improving downstream zero-shot cross-lingual transfer. In this work, we propose orthogonal language and task adapters (dubbed orthoadapters) for cross-lingual transfer. They are trained to encode language- and task-specific information that is complementary (i.e., orthogonal) to the knowledge already stored in the pretrained transformer's parameters. Our zero-shot cross-lingual transfer experiments, involving three tasks (POS-tagging, NER, NLI) and a set of 10 diverse languages, 1) point to the usefulness of orthoadapters in cross-lingual transfer, especially for the most complex NLI task, but also 2) indicate that the optimal adapter configuration highly depends on the task and the target language. We hope that our work will motivate a wider investigation of usefulness of orthogonality constraints in language- and task-specific fine-tuning of pretrained transformers.

[1]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[2]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[3]  Goran Glavas,et al.  How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions , 2019, ACL.

[4]  Monojit Choudhury,et al.  The State and Fate of Linguistic Diversity and Inclusion in the NLP World , 2020, ACL.

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Massimiliano Pontil,et al.  Exploiting Unrelated Tasks in Multi-Task Learning , 2012, AISTATS.

[7]  Andrea Vedaldi,et al.  Learning multiple visual domains with residual adapters , 2017, NIPS.

[8]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[9]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[10]  Ryan Cotterell,et al.  Parameter Space Factorization for Zero-Shot Learning across Tasks and Languages , 2020, Transactions of the Association for Computational Linguistics.

[11]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[12]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[13]  Dan Roth,et al.  Extending Multilingual BERT to Low-Resource Languages , 2020, FINDINGS.

[14]  Yichao Lu,et al.  Don't Use English Dev: On the Zero-Shot Cross-Lingual Evaluation of Contextual Embeddings , 2020, EMNLP.

[15]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[16]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[17]  Yulia Tsvetkov,et al.  On Negative Interference in Multilingual Language Models , 2020, EMNLP.

[18]  Iryna Gurevych,et al.  AdapterFusion: Non-Destructive Task Composition for Transfer Learning , 2020, EACL.

[19]  Goran Glavaš,et al.  From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers , 2020, EMNLP.

[20]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[21]  Samuel R. Bowman,et al.  Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work? , 2020, ACL.

[22]  Qianchu Liu,et al.  XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning , 2020, EMNLP.

[23]  Trevor Cohn,et al.  Massively Multilingual Transfer for NER , 2019, ACL.

[24]  Mark Dredze,et al.  Are All Languages Created Equal in Multilingual BERT? , 2020, REPL4NLP.

[25]  Samuel R. Bowman,et al.  English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too , 2020, AACL/IJCNLP.

[26]  Iryna Gurevych,et al.  MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.

[27]  Francis M. Tyers,et al.  Universal Dependencies , 2017, EACL.

[28]  Sebastian Ruder,et al.  A survey of cross-lingual embedding models , 2017, ArXiv.

[29]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[30]  Noah A. Smith,et al.  To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.

[31]  Iryna Gurevych,et al.  MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale , 2020, EMNLP.

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33]  Iryna Gurevych,et al.  AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.

[34]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[35]  Sebastian Ruder,et al.  Episodic Memory in Lifelong Language Learning , 2019, NeurIPS.

[36]  Heng Ji,et al.  Cross-lingual Name Tagging and Linking for 282 Languages , 2017, ACL.