AdapterHub: A Framework for Adapting Transformers

The current modus operandi in NLP involves downloading and fine-tuning pre-trained models consisting of millions or billions of parameters. Storing and sharing such large trained models is expensive, slow, and time-consuming, which impedes progress towards more general and versatile NLP methods that learn from and for many tasks. Adapters—small learnt bottleneck layers inserted within each layer of a pre-trained model— ameliorate this issue by avoiding full fine-tuning of the entire model. However, sharing and integrating adapter layers is not straightforward. We propose AdapterHub, a framework that allows dynamic “stiching-in” of pre-trained adapters for different tasks and languages. The framework, built on top of the popular HuggingFace Transformers library, enables extremely easy and quick adaptations of state-of-the-art pre-trained models (e.g., BERT, RoBERTa, XLM-R) across tasks and languages. Downloading, sharing, and training adapters is as seamless as possible using minimal changes to the training scripts and a specialized infrastructure. Our framework enables scalable and easy access to sharing of task-specific models, particularly in low-resource scenarios. AdapterHub includes all recent adapter architectures and can be found at AdapterHub.ml

[1]  Thomas Wolf,et al.  A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks , 2018, AAAI.

[2]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[3]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[4]  Samuel R. Bowman,et al.  Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.

[5]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[6]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[7]  Ankur Bapna,et al.  Simple, Scalable Adaptation for Neural Machine Translation , 2019, EMNLP.

[8]  Thomas Wolf,et al.  Transfer Learning in Natural Language Processing , 2019, NAACL.

[9]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[10]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[11]  Sebastian Ruder,et al.  Episodic Memory in Lifelong Language Learning , 2019, NeurIPS.

[12]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[13]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[14]  Noah A. Smith,et al.  To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.

[15]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[16]  Chris Brockett,et al.  Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[17]  Anna Korhonen,et al.  Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity , 2019, COLING.

[18]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[19]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[20]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[21]  Andrea Vedaldi,et al.  Learning multiple visual domains with residual adapters , 2017, NIPS.

[22]  Samuel R. Bowman,et al.  Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.

[23]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[24]  Xuanjing Huang,et al.  K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters , 2020, FINDINGS.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  Roy Bar-Haim,et al.  The Second PASCAL Recognising Textual Entailment Challenge , 2006 .

[27]  Samuel R. Bowman,et al.  Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work? , 2020, ACL.

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Iryna Gurevych,et al.  MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.

[30]  Iryna Gurevych,et al.  AdapterDrop: On the Efficiency of Adapters in Transformers , 2020, EMNLP.

[31]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[32]  Samuel R. Bowman,et al.  Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work? , 2020, ACL.

[33]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[34]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[35]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[36]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[37]  Dan Klein,et al.  Learning to Compose Neural Networks for Question Answering , 2016, NAACL.

[38]  Peter Clark,et al.  The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.

[39]  Iryna Gurevych,et al.  AdapterFusion: Non-Destructive Task Composition for Transfer Learning , 2021, EACL.

[40]  Iain Murray,et al.  BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.

[41]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[42]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[43]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.