Robust Transfer Learning with Pretrained Language Models through Adapters

Transfer learning with large pretrained transformer-based language models like BERT has become a dominating approach for most NLP tasks. Simply fine-tuning those large language models on downstream tasks or combining it with task-specific pretraining is often not robust. In particular, the performance considerably varies as the random seed changes or the number of pretraining and/or fine-tuning iterations varies, and the fine-tuned model is vulnerable to adversarial attack. We propose a simple yet effective adapter-based approach to mitigate these issues. Specifically, we insert small bottleneck layers (i.e., adapter) within each layer of a pretrained model, then fix the pretrained layers and train the adapter layers on the downstream task data, with (1) task-specific unsupervised pretraining and then (2) task-specific supervised training (e.g., classification, sequence labeling). Our experiments demonstrate that such a training scheme leads to improved stability and adversarial robustness in transfer learning to various downstream tasks. 1

[1]  Iryna Gurevych,et al.  AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.

[2]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[3]  Noah A. Smith,et al.  To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.

[4]  Marius Mosbach,et al.  On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines , 2020, ArXiv.

[5]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[6]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[7]  Erik Nijkamp,et al.  SCRIPT: Self-Critic PreTraining of Transformers , 2021, NAACL.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Sebastian Ruder,et al.  Neural transfer learning for natural language processing , 2019 .

[10]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[11]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Iryna Gurevych,et al.  AdapterFusion: Non-Destructive Task Composition for Transfer Learning , 2021, EACL.

[13]  Wanxiang Che,et al.  Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[14]  Zhongjun He,et al.  Robust Neural Machine Translation with Joint Textual and Phonetic Embedding , 2018, ACL.

[15]  Kyunghyun Cho,et al.  Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models , 2020, ICLR.

[16]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[17]  Lidong Bing,et al.  On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation , 2021, ACL.

[18]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[19]  Philip S. Yu,et al.  Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT , 2020, ArXiv.

[20]  Dat Quoc Nguyen,et al.  BERTweet: A pre-trained language model for English Tweets , 2020, EMNLP.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[23]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[24]  Xipeng Qiu,et al.  BERT-ATTACK: Adversarial Attack against BERT Using BERT , 2020, EMNLP.

[25]  Samuel R. Bowman,et al.  Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.

[26]  Armen Aghajanyan,et al.  Pre-training via Paraphrasing , 2020, NeurIPS.