论文信息 - Text Summarization with Pretrained Encoders - 字舞流文

Text Summarization with Pretrained Encoders

Bidirectional Encoder Representations from Transformers (BERT) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how BERT can be usefully applied in text summarization and propose a general framework for both extractive and abstractive models. We introduce a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences. Our extractive model is built on top of this encoder by stacking several inter-sentence Transformer layers. For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not). We also demonstrate that a two-staged fine-tuning approach can further boost the quality of the generated summaries. Experiments on three datasets show that our model achieves state-of-the-art results across the board in both extractive and abstractive settings. Our code is available at this https URL

Mirella Lapata | Yang Liu | Mirella Lapata | Yang Liu

[1] Shashi Narayan,et al. Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[2] Mirella Lapata,et al. Single Document Summarization as Tree Induction , 2019, NAACL.

[3] Mirella Lapata,et al. Discourse Constraints for Document Compression , 2010, CL.

[4] Jackie Chi Kit Cheung,et al. BanditSum: Extractive Summarization as a Contextual Bandit , 2018, EMNLP.

[5] Alexander M. Rush,et al. Bottom-Up Abstractive Summarization , 2018, EMNLP.

[6] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[7] Hang Li,et al. “ Tony ” DNN Embedding for “ Tony ” Selective Read for “ Tony ” ( a ) Attention-based Encoder-Decoder ( RNNSearch ) ( c ) State Update s 4 SourceVocabulary Softmax Prob , 2016 .

[8] Jade Goldstein-Stewart,et al. The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[9] Bowen Zhou,et al. Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[10] Sergey Edunov,et al. Pre-trained language model representations for language generation , 2019, NAACL.

[11] Bowen Zhou,et al. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[12] Mirella Lapata,et al. Neural Latent Extractive Document Summarization , 2018, EMNLP.

[13] Jason Weston,et al. A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[14] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[15] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.

[16] Yejin Choi,et al. Deep Communicating Agents for Abstractive Summarization , 2018, NAACL.

[17] Mirella Lapata,et al. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.

[18] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[20] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[21] Tiejun Zhao,et al. Neural Document Summarization by Jointly Learning to Score and Select Sentences , 2018, ACL.

[22] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[23] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[24] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[25] Dan Klein,et al. Learning-Based Single-Document Summarization with Compression and Anaphoricity Constraints , 2016, ACL.

[26] Saif Mohammad,et al. Best-Worst Scaling More Reliable than Rating Scales: A Case Study on Sentiment Intensity Annotation , 2017, ACL.

[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[28] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[29] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.

[30] Xinyan Xiao,et al. Improving Neural Abstractive Document Summarization with Explicit Information Selection Modeling , 2018, EMNLP.

[31] Mirella Lapata,et al. Ranking Sentences for Extractive Summarization with Reinforcement Learning , 2018, NAACL.

[32] 悠太菊池,et al. 大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[33] Ming Zhou,et al. HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization , 2019, ACL.

[34] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[35] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .