InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT

While large models such as GPT-3 demonstrate exceptional performance in zeroshot and fewshot summarization tasks, their extensive serving and fine-tuning costs hinder their utilization in various applications. Conversely, previous studies have found that although automatic metrics tend to favor smaller fine-tuned models, the quality of the summaries they generate is inferior to that of larger models like GPT-3 when assessed by human evaluators. To address this issue, we propose InheritSumm, a versatile and compact summarization model derived from GPT-3.5 through distillation. InheritSumm not only exhibits comparable zeroshot and fewshot summarization capabilities to GPT-3.5 but is also sufficiently compact for fine-tuning purposes. Experimental results demonstrate that InheritSumm achieves similar or superior performance to GPT-3.5 in zeroshot and fewshot settings. Furthermore, it outperforms the previously established best small models in both prefix-tuning and full-data fine-tuning scenarios.

[1]  Chunyuan Li,et al.  Instruction Tuning with GPT-4 , 2023, ArXiv.

[2]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[3]  Noah A. Smith,et al.  Self-Instruct: Aligning Language Model with Self Generated Instructions , 2022, ArXiv.

[4]  Yang Liu,et al.  UniSumm: Unified Few-shot Summarization with Multi-Task Pre-Training and Prefix-Tuning , 2022, arXiv.org.

[5]  Junyi Jessy Li,et al.  News Summarization and Evaluation in the Era of GPT-3 , 2022, ArXiv.

[6]  Hany Hassan Awadalla,et al.  Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization , 2022, ACL.

[7]  Chenguang Zhu,et al.  Impossible Triangle: What's Next for Pre-trained Language Models? , 2022, ArXiv.

[8]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[9]  Reza Yazdani Aminabadi,et al.  Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model , 2022, ArXiv.

[10]  Quoc V. Le,et al.  Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.

[11]  Mirella Lapata,et al.  Models and Datasets for Cross-Lingual Summarisation , 2022, EMNLP.

[12]  Hong-Han Shuai,et al.  Meta-Transfer Learning for Low-Resource Abstractive Summarization , 2021, AAAI.

[13]  Charles Foster,et al.  The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[14]  Percy Liang,et al.  Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.

[15]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[16]  Peter J. Liu,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.

[17]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[18]  Aleksander Wawer,et al.  SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization , 2019, EMNLP.

[19]  Lu Wang,et al.  BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization , 2019, ACL.

[20]  Dragomir R. Radev,et al.  Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model , 2019, ACL.

[21]  Gunhee Kim,et al.  Abstractive Summarization of Reddit Posts with Multi-level Memory Networks , 2018, NAACL.

[22]  William Yang Wang,et al.  WikiHow: A Large Scale Text Summarization Dataset , 2018, ArXiv.

[23]  Franck Dernoncourt,et al.  A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents , 2018, NAACL.

[24]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[25]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.