论文信息 - InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT - 字舞流文

InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT

While large models such as GPT-3 demonstrate exceptional performance in zeroshot and fewshot summarization tasks, their extensive serving and fine-tuning costs hinder their utilization in various applications. Conversely, previous studies have found that although automatic metrics tend to favor smaller fine-tuned models, the quality of the summaries they generate is inferior to that of larger models like GPT-3 when assessed by human evaluators. To address this issue, we propose InheritSumm, a versatile and compact summarization model derived from GPT-3.5 through distillation. InheritSumm not only exhibits comparable zeroshot and fewshot summarization capabilities to GPT-3.5 but is also sufficiently compact for fine-tuning purposes. Experimental results demonstrate that InheritSumm achieves similar or superior performance to GPT-3.5 in zeroshot and fewshot settings. Furthermore, it outperforms the previously established best small models in both prefix-tuning and full-data fine-tuning scenarios.

Chenguang Zhu | Dan Iter | Yang Liu | Chenguang Zhu | Michael Zeng | Yichong Xu | Yang Liu | Shuo Wang | Ruochen Xu

[1] Chunyuan Li,et al. Instruction Tuning with GPT-4 , 2023, ArXiv.

[2] Naman Goyal,et al. LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[3] Noah A. Smith,et al. Self-Instruct: Aligning Language Model with Self Generated Instructions , 2022, ArXiv.

[4] Yang Liu,et al. UniSumm: Unified Few-shot Summarization with Multi-Task Pre-Training and Prefix-Tuning , 2022, arXiv.org.

[5] Junyi Jessy Li,et al. News Summarization and Evaluation in the Era of GPT-3 , 2022, ArXiv.

[6] Hany Hassan Awadalla,et al. Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization , 2022, ACL.

[7] Chenguang Zhu,et al. Impossible Triangle: What's Next for Pre-trained Language Models? , 2022, ArXiv.

[8] Andrew M. Dai,et al. PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[9] Reza Yazdani Aminabadi,et al. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model , 2022, ArXiv.

[10] Quoc V. Le,et al. Finetuned Language Models Are Zero-Shot Learners , 2021, ICLR.

[11] Mirella Lapata,et al. Models and Datasets for Cross-Lingual Summarisation , 2022, EMNLP.

[12] Hong-Han Shuai,et al. Meta-Transfer Learning for Low-Resource Abstractive Summarization , 2021, AAAI.

[13] Charles Foster,et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling , 2020, ArXiv.

[14] Percy Liang,et al. Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.

[15] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[16] Peter J. Liu,et al. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.

[17] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[18] Aleksander Wawer,et al. SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization , 2019, EMNLP.

[19] Lu Wang,et al. BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization , 2019, ACL.

[20] Dragomir R. Radev,et al. Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model , 2019, ACL.

[21] Gunhee Kim,et al. Abstractive Summarization of Reddit Posts with Multi-level Memory Networks , 2018, NAACL.

[22] William Yang Wang,et al. WikiHow: A Large Scale Text Summarization Dataset , 2018, ArXiv.

[23] Franck Dernoncourt,et al. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents , 2018, NAACL.

[24] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[25] Ani Nenkova,et al. A Survey of Text Summarization Techniques , 2012, Mining Text Data.