Evaluating Parameter Efficient Learning for Generation

Parameter efficient learning methods (PERMs)have recently gained significant attention asthey provide an efficient way for pre-trainedlanguage models (PLMs) to adapt to a downstream task. However, these conclusions aremostly drawn from in-domain evaluations overthe full training set. In this paper, we presentcomparisons between PERMs and finetuningfrom three new perspectives: (1) the effect ofsample and model size to in-domain evaluations, (2) generalization to unseen domains andnew datasets, and (3) the faithfulness of generations. Our results show that for in-domainsettings (a) there is a cross point of samplesize for which PERMs will perform better thanfinetuning when training with fewer samples,and (b) larger PLMs have larger cross points.For cross-domain and cross-dataset cases, weshow that (a) Adapter (Houlsby et al., 2019)performs the best amongst all the PERMs studied here, and (b) it outperforms finetuning ifthe task dataset is below a certain size. Wealso compare the faithfulness of generationsand show that PERMs can achieve better faithfulness score than finetuning, especially forsmall training set, by as much as 6%. Finally,we apply Adapter to MT-NLG 530b (Smithet al., 2022) and achieve new state-of-the-artresults on Xsum (Narayan et al., 2018) for allROUGE scores (ROUGE-1 49.17, ROUGE-227.20, ROUGE-L 40.98).

[1]  Dragomir R. Radev,et al.  BRIO: Bringing Order to Abstractive Summarization , 2022, ACL.

[2]  Haitao Zheng,et al.  Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models , 2022, ArXiv.

[3]  Reza Yazdani Aminabadi,et al.  Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model , 2022, ArXiv.

[4]  Jie Zhou,et al.  On Transferability of Prompt Tuning for Natural Language Processing , 2021, NAACL.

[5]  Brian Lester,et al.  SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer , 2021, ACL.

[6]  Graham Neubig,et al.  Towards a Unified View of Parameter-Efficient Transfer Learning , 2021, ICLR.

[7]  Minlie Huang,et al.  PPT: Pre-trained Prompt Tuning for Few-shot Learning , 2021, ACL.

[8]  Yoav Goldberg,et al.  BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models , 2021, ACL.

[9]  Yelong Shen,et al.  LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.

[10]  Laure Soulier,et al.  Controlling hallucinations at word level in data-to-text generation , 2021, Data Mining and Knowledge Discovery.

[11]  Zhilin Yang,et al.  P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks , 2022, ACL.

[12]  Lu Wang,et al.  CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in Abstractive Summarization , 2021, EMNLP.

[13]  Ying Nian Wu,et al.  Robust Transfer Learning with Pretrained Language Models through Adapters , 2021, ACL.

[14]  David Reitter,et al.  Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features , 2021, ACL.

[15]  Sebastian Ruder,et al.  Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks , 2021, ACL.

[16]  Lidong Bing,et al.  On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation , 2021, ACL.

[17]  Bing Qin,et al.  The Factual Inconsistency Problem in Abstractive Text Summarization: A Survey , 2021, ArXiv.

[18]  Artidoro Pagnoni,et al.  Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics , 2021, NAACL.

[19]  Brian Lester,et al.  The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[20]  Andrea Madotto,et al.  Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding , 2021, EMNLP.

[21]  Jason Weston,et al.  Retrieval Augmentation Reduces Hallucination in Conversation , 2021, EMNLP.

[22]  Zhengxiao Du,et al.  GPT Understands, Too , 2021, AI Open.

[23]  Zhifang Sui,et al.  Towards Faithfulness in Open Domain Table-to-text Generation from an Entity-centric View , 2021, AAAI.

[24]  Alexander M. Rush,et al.  Parameter-Efficient Transfer Learning with Diff Pruning , 2020, ACL.

[25]  Bill Dolan,et al.  A Controllable Model of Grounded Response Generation , 2020, AAAI.

[26]  Iryna Gurevych,et al.  AdapterFusion: Non-Destructive Task Composition for Transfer Learning , 2020, EACL.

[27]  Shashi Narayan,et al.  A Thorough Evaluation of Task-Specific Pretraining for Summarization , 2021, EMNLP.

[28]  Joe Davison,et al.  Compacter: Efficient Low-Rank Hypercomplex Adapter Layers , 2021, NeurIPS.

[29]  Percy Liang,et al.  Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.

[30]  Jackie Chi Kit Cheung,et al.  Multi-Fact Correction in Abstractive Text Summarization , 2020, EMNLP.

[31]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[32]  Rico Sennrich,et al.  On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation , 2020, ACL.

[33]  Martin Jaggi,et al.  Masking as an Efficient Alternative to Finetuning for Pretrained Language Models , 2020, EMNLP.

[34]  Quoc V. Le,et al.  Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[35]  Richard Socher,et al.  Evaluating the Factual Consistency of Abstractive Text Summarization , 2019, EMNLP.

[36]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[37]  M. Shoeybi,et al.  Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[38]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[39]  Chin-Yew Lin,et al.  A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation , 2019, ACL.

[40]  Iain Murray,et al.  BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.

[41]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[42]  Jason Weston,et al.  Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.

[43]  Ashish Agarwal,et al.  Hallucinations in Neural Machine Translation , 2018 .

[44]  Alan W. Black,et al.  A Dataset for Document Grounded Conversations , 2018, EMNLP.

[45]  Mirella Lapata,et al.  Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.

[46]  Furu Wei,et al.  Faithful to the Original: Fact Aware Neural Abstractive Summarization , 2017, AAAI.

[47]  Alexander M. Rush,et al.  Challenges in Data-to-Document Generation , 2017, EMNLP.

[48]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[49]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[50]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.