论文信息 - Robust Transfer Learning with Pretrained Language Models through Adapters - 字舞流文

Robust Transfer Learning with Pretrained Language Models through Adapters

Transfer learning with large pretrained transformer-based language models like BERT has become a dominating approach for most NLP tasks. Simply fine-tuning those large language models on downstream tasks or combining it with task-specific pretraining is often not robust. In particular, the performance considerably varies as the random seed changes or the number of pretraining and/or fine-tuning iterations varies, and the fine-tuned model is vulnerable to adversarial attack. We propose a simple yet effective adapter-based approach to mitigate these issues. Specifically, we insert small bottleneck layers (i.e., adapter) within each layer of a pretrained model, then fix the pretrained layers and train the adapter layers on the downstream task data, with (1) task-specific unsupervised pretraining and then (2) task-specific supervised training (e.g., classification, sequence labeling). Our experiments demonstrate that such a training scheme leads to improved stability and adversarial robustness in transfer learning to various downstream tasks. 1

Ying Nian Wu | Bo Pang | Wenjuan Han | Y. Wu | Bo Pang | Wenjuan Han

[1] Iryna Gurevych,et al. AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.

[2] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[3] Noah A. Smith,et al. To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.

[4] Marius Mosbach,et al. On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines , 2020, ArXiv.

[5] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[6] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[7] Erik Nijkamp,et al. SCRIPT: Self-Critic PreTraining of Transformers , 2021, NAACL.

[8] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9] Sebastian Ruder,et al. Neural transfer learning for natural language processing , 2019 .

[10] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[11] Seyed-Mohsen Moosavi-Dezfooli,et al. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Iryna Gurevych,et al. AdapterFusion: Non-Destructive Task Composition for Transfer Learning , 2021, EACL.

[13] Wanxiang Che,et al. Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[14] Zhongjun He,et al. Robust Neural Machine Translation with Joint Textual and Phonetic Embedding , 2018, ACL.

[15] Kyunghyun Cho,et al. Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models , 2020, ICLR.

[16] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[17] Lidong Bing,et al. On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation , 2021, ACL.

[18] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[19] Philip S. Yu,et al. Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT , 2020, ArXiv.

[20] Dat Quoc Nguyen,et al. BERTweet: A pre-trained language model for English Tweets , 2020, EMNLP.

[21] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[22] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[23] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[24] Xipeng Qiu,et al. BERT-ATTACK: Adversarial Attack against BERT Using BERT , 2020, EMNLP.

[25] Samuel R. Bowman,et al. Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.

[26] Armen Aghajanyan,et al. Pre-training via Paraphrasing , 2020, NeurIPS.