Task-to-Task Transfer Learning with Parameter-Efficient Adapter

Many existing pre-trained language models have yielded strong performance on many NLP tasks. They depend on enough labeled data of downstream tasks, which are difficult to be trained on tasks with limited data. Transfer learning from large labeled task to narrow task based on the pre-trained language models can solve this problem. However, it always suffers from catastrophic forgetting. In this paper, we propose an effective task-to-task transfer learning method with parameter-efficient adapter based on pre-trained language model, which can be trained on new tasks without hindering the performance of those already learned. Our experiments include transfer learning from MNLI or SQUAD (as the source task) to some related small data tasks based on Bert. Experimental results show large gains in effectiveness over previous approaches on transfer learning and domain adaptation without forgetting. By adding less than 2.1% of the parameters, our method matches or outperforms vanilla fine-tuning and can overcome catastrophic forgetting.

[1]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[2]  Mariana L. Neves,et al.  Neural Domain Adaptation for Biomedical Question Answering , 2017, CoNLL.

[3]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[4]  Sebastian Ruder,et al.  Episodic Memory in Lifelong Language Learning , 2019, NeurIPS.

[5]  Andrea Vedaldi,et al.  Efficient Parametrization of Multi-domain Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[7]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[9]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[10]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[11]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[12]  Antonio Jimeno-Yepes,et al.  Forget Me Not: Reducing Catastrophic Forgetting for Domain Adaptation in Reading Comprehension , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[13]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[14]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[15]  Sebastian Ruder,et al.  Neural transfer learning for natural language processing , 2019 .

[16]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[17]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[18]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[19]  Jonathan Berant,et al.  MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension , 2019, ACL.

[20]  Fan-Keng Sun,et al.  LAMAL: LAnguage Modeling Is All You Need for Lifelong Language Learning , 2019, ICLR 2020.

[21]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[22]  Chris Brockett,et al.  Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[23]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[25]  Thomas Wolf,et al.  Transfer Learning in Natural Language Processing , 2019, NAACL.

[26]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[27]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.