Make Lead Bias in Your Favor: A Simple and Effective Method for News Summarization

Lead bias is a common phenomenon in news summarization, where early parts of an article often contain the most salient information. While many algorithms exploit this fact in summary generation, it has a detrimental effect on teaching the model to discriminate and extract important information. We propose that the lead bias can be leveraged in a simple and effective way in our favor to pretrain abstractive news summarization models on large-scale unlabeled corpus: predicting the leading sentences using the rest of an article. Via careful data cleaning and filtering, our transformer-based pretrained model without any finetuning achieves remarkable results over various news summarization tasks. With further finetuning, our model outperforms many competitive baseline models. Human evaluations further show the effectiveness of our method.

[1]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[2]  D. Donnelly Philadelphia , 2019, History of My Own Times; or, the Life and Adventures of William Otter, Sen., Comprising a Series of Events, and Musical Incidents Altogether Original.

[3]  Peter J. Liu,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2019, ICML.

[4]  Yejin Choi,et al.  BottleSum: Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle , 2019, EMNLP.

[5]  Jackie Chi Kit Cheung,et al.  Countering the Effects of Lead Bias in News Summarization via Multi-Stage Training and Auxiliary Losses , 2019, EMNLP.

[6]  Jianfeng Gao,et al.  On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.

[7]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.

[8]  Weijia Jia,et al.  Improving Abstractive Document Summarization with Salient Information Modeling , 2019, ACL.

[9]  Xiaodong Liu,et al.  Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[10]  Ilya Gusev,et al.  Importance of Copying Mechanism for News Headline Generation , 2019, ArXiv.

[11]  Ioannis Konstas,et al.  SEQˆ3: Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression , 2019, NAACL.

[12]  Kyunghyun Cho,et al.  Passage Re-ranking with BERT , 2019, ArXiv.

[13]  Chenguang Zhu,et al.  SDNet: Contextualized Attention-based Deep Network for Conversational Question Answering , 2018, ArXiv.

[14]  Kathleen McKeown,et al.  Content Selection in Deep Learning Models of Summarization , 2018, EMNLP.

[15]  Mirella Lapata,et al.  Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.

[16]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[17]  Alexander M. Rush,et al.  Bottom-Up Abstractive Summarization , 2018, EMNLP.

[18]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[19]  Piji Li,et al.  Deep Recurrent Generative Decoder for Abstractive Text Summarization , 2017, EMNLP.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[22]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[23]  Dan Klein,et al.  Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints , 2016, ACL.

[24]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[25]  Alexander M. Rush,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[26]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[27]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[28]  Paul Over,et al.  DUC in context , 2007, Inf. Process. Manag..

[29]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[30]  M Duch,et al.  [Information processing management in nursing units]. , 1988, Revista de enfermeria.

[31]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[32]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .