Flexible Non-Autoregressive Extractive Summarization with Threshold: How to Extract a Non-Fixed Number of Summary Sentences

Sentence-level extractive summarization is a fundamental yet challenging task, and recent powerful approaches prefer to pick sentences sorted by the predicted probabilities until the length limit is reached, a.k.a. “Top-K Strategy”. This length limit is fixed based on the validation set, resulting in the lack of flexibility. In this work, we propose a more flexible and accurate non-autoregressive method for single document extractive summarization, extracting a non-fixed number of summary sentences without the sorting step. We call our approach ThresSum as it picks sentences simultaneously and individually from the source document when the predicted probabilities exceed a threshold. During training, the model enhances sentence representation through iterative refinement and the intermediate latent variables receive some weak supervision with soft labels, which are generated progressively by adjusting the temperature with a knowledge distillation algorithm. Specifically, the temperature is initialized with high value and drops along with the iteration until a temperature of 1. Experimental results on CNN/DM and NYT datasets have demonstrated the effectiveness of ThresSum, which significantly outperforms BERTSUMEXT with a substantial improvement of 0.74 ROUGE-1 score on CNN/DM.

[1]  Di He,et al.  Multilingual Neural Machine Translation with Knowledge Distillation , 2019, ICLR.

[2]  Pengfei Liu,et al.  Extractive Summarization as Text Matching , 2020, ACL.

[3]  Ming Zhou,et al.  HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization , 2019, ACL.

[4]  Mirella Lapata,et al.  Ranking Sentences for Extractive Summarization with Reinforcement Learning , 2018, NAACL.

[5]  Giuseppe Carenini,et al.  Extractive Summarization of Long Documents by Combining Global and Local Context , 2019, EMNLP.

[6]  Ji Wang,et al.  Pretraining-Based Natural Language Generation for Text Summarization , 2019, CoNLL.

[7]  Dan Klein,et al.  Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints , 2016, ACL.

[8]  Jiacheng Xu,et al.  Neural Extractive Text Summarization with Syntactic Compression , 2019, EMNLP.

[9]  Yu Cheng,et al.  Discourse-Aware Neural Extractive Text Summarization , 2020, ACL.

[10]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[11]  Jihoon Kim,et al.  Summary Level Training of Sentence Rewriting for Abstractive Summarization , 2019, EMNLP.

[12]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[13]  Zita Marinho,et al.  Jointly Extracting and Compressing Documents with Summary State Representations , 2019, NAACL.

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Yu Cheng,et al.  Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.

[16]  Tiejun Zhao,et al.  Neural Document Summarization by Jointly Learning to Score and Select Sentences , 2018, ACL.

[17]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[18]  Yejin Choi,et al.  Deep Communicating Agents for Abstractive Summarization , 2018, NAACL.

[19]  Ruipeng Jia,et al.  Neural Extractive Summarization with Hierarchical Attentive Heterogeneous Graph Network , 2020, EMNLP.

[20]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[21]  Xuanjing Huang,et al.  Searching for Effective Neural Extractive Summarization: What Works and What’s Next , 2019, ACL.

[22]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[23]  Xuanjing Huang,et al.  A Closer Look at Data Bias in Neural Extractive Summarization Models , 2019, EMNLP.

[24]  Franck Dernoncourt,et al.  Scoring Sentence Singletons and Pairs for Abstractive Summarization , 2019, ACL.

[25]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[26]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[27]  Jackie Chi Kit Cheung,et al.  BanditSum: Extractive Summarization as a Contextual Bandit , 2018, EMNLP.

[28]  Alexander M. Rush,et al.  Sequence-Level Knowledge Distillation , 2016, EMNLP.

[29]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[30]  John M. Conroy,et al.  An Assessment of the Accuracy of Automatic Evaluation in Summarization , 2012, EvalMetrics@NAACL-HLT.

[31]  Yao Zhao,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.

[32]  Aishwarya Jadhav,et al.  Extractive Summarization with SWAP-NET: Sentences and Words from Alternating Pointer Networks , 2018, ACL.

[33]  Pengfei Liu,et al.  Heterogeneous Graph Neural Networks for Extractive Document Summarization , 2020, ACL.

[34]  Fei Liu,et al.  Reinforced Extractive Summarization with Question-Focused Rewards , 2018, ACL.

[35]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[36]  Yanbing Liu,et al.  DistilSum:: Distilling the Knowledge for Extractive Summarization , 2020, CIKM.

[37]  Yen-Chun Chen,et al.  Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting , 2018, ACL.

[38]  Xuanjing Huang,et al.  Exploring Domain Shift in Extractive Text Summarization , 2019, ArXiv.

[39]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[40]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[41]  Ming Zhou,et al.  ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training , 2020, FINDINGS.

[42]  Ramiz M. Aliguliyev The two-stage unsupervised approach to multidocument summarization , 2009, Automatic Control and Computer Sciences.

[43]  Mirella Lapata,et al.  Neural Summarization by Extracting Sentences and Words , 2016, ACL.

[44]  Mirella Lapata,et al.  Neural Latent Extractive Document Summarization , 2018, EMNLP.

[45]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[46]  Furu Wei,et al.  Faithful to the Original: Fact Aware Neural Abstractive Summarization , 2017, AAAI.

[47]  Ion Androutsopoulos,et al.  An extractive supervised two-stage method for sentence compression , 2010, NAACL.

[48]  Zhe Hu,et al.  An Entity-Driven Framework for Abstractive Summarization , 2019, EMNLP.

[49]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[50]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[51]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.