论文信息 - On Generating Extended Summaries of Long Documents - 字舞流文

On Generating Extended Summaries of Long Documents

Prior work in document summarization has mainly focused on generating short summaries of a document. While this type of summary helps get a high-level view of a given document, it is desirable in some cases to know more detailed information about its salient points that can’t fit in a short summary. This is typically the case for longer documents such as a research paper, legal document, or a book. In this paper, we present a new method for generating extended summaries of long papers. Our method exploits hierarchical structure of the documents and incorporates it into an extractive summarization model through a multi-task learning approach. We then present our results on three long summarization datasets, arXiv-Long, PubMed-Long, and Longsumm. Our method outperforms or matches the performance of strong baselines. Furthermore, we perform a comprehensive analysis over the generated results, shedding insights on future research for long-form summary generation task. Our analysis shows that our multi-tasking approach can adjust extraction probability distribution to the favor of summary-worthy sentences across diverse sections. Our datasets, and codes are publicly available at https: //github.com/Georgetown-IR-Lab/ExtendedSumm.

Nazli Goharian | Arman Cohan | Sajad Sotudeh | Arman Cohan | Nazli Goharian | Sajad Sotudeh

[1] Yu Cheng,et al. Discourse-Aware Neural Extractive Text Summarization , 2020, ACL.

[2] Guy Lev,et al. TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks , 2019, ACL.

[3] Bowen Zhou,et al. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[4] Yao Zhao,et al. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.

[5] Nazli Goharian,et al. Ontology-Aware Clinical Abstractive Summarization , 2019, SIGIR.

[6] Nazli Goharian,et al. Scientific document summarization via citation contextualization and scientific discourse , 2017, International Journal on Digital Libraries.

[7] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.

[8] Xiang Ren,et al. Multi-document Summarization with Maximal Marginal Relevance-guided Reinforcement Learning , 2020, EMNLP.

[9] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[10] Jackie Chi Kit Cheung,et al. Multi-Fact Correction in Abstractive Text Summarization , 2020, EMNLP.

[11] Tiejun Zhao,et al. Neural Document Summarization by Jointly Learning to Score and Select Sentences , 2018, ACL.

[12] Nazli Goharian,et al. Scientific Article Summarization Using Citation-Context and Article’s Discourse Structure , 2015, EMNLP.

[13] Ruipeng Jia,et al. Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network , 2020, EMNLP.

[14] Dragomir R. Radev,et al. Coherent Citation-Based Summarization of Scientific Papers , 2011, ACL.

[15] Franck Dernoncourt,et al. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents , 2018, NAACL.

[16] Eduard Hovy,et al. Overview and Insights from the Shared Tasks at Scholarly Document Processing 2020: CL-SciSumm, LaySumm and LongSumm , 2020, SDP.

[17] Nazli Goharian,et al. GUIR @ LongSumm 2020: Learning to Generate Long Summaries from Scientific Documents , 2020, SDP.

[18] Mirella Lapata,et al. Text Summarization with Pretrained Encoders , 2019, EMNLP.

[19] Marc Moens,et al. Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[20] Isabelle Augenstein,et al. A Supervised Approach to Extractive Summarisation of Scientific Papers , 2017, CoNLL.

[21] Bhavana Dalvi,et al. Pretrained Language Models for Sequential Sentence Classification , 2019, EMNLP.

[22] Grigorios Tsoumakas,et al. A Divide-and-Conquer Approach to the Summarization of Academic Articles , 2020, ArXiv.

[23] Giuseppe Carenini,et al. Extractive Summarization of Long Documents by Combining Global and Local Context , 2019, EMNLP.

[24] Sriparna Saha,et al. IIITBH-IITP@CL-SciSumm20, CL-LaySumm20, LongSumm20 , 2020, SDP@EMNLP.

[25] Li Fei-Fei,et al. Dynamic Task Prioritization for Multitask Learning , 2018, ECCV.

[26] Vasudeva Varma,et al. Summaformers @ LaySumm 20, LongSumm 20 , 2020, SDP.

[27] Wei Liu,et al. CIST@CL-SciSumm 2020, LongSumm 2020: Automatic Scientific Document Summarization , 2020, SDP.

[28] John M. Conroy,et al. Section mixture models for scientific document summarization , 2017, International Journal on Digital Libraries.

[29] Nazli Goharian,et al. Attend to Medical Ontologies: Content Selection for Clinical Abstractive Summarization , 2020, ACL.