Windowing Models for Abstractive Summarization of Long Texts

Neural summarization models suffer from the fixed-size input limitation: if text length surpasses the model's maximal number of input tokens, some document content (possibly summary-relevant) gets truncated Independently summarizing windows of maximal input size disallows for information flow between windows and leads to incoherent summaries. We propose windowing models for neural abstractive summarization of (arbitrarily) long texts. We extend the sequence-to-sequence model augmented with pointer generator network by (1) allowing the encoder to slide over different windows of the input document and (2) sharing the decoder and retaining its state across different input windows. We explore two windowing variants: Static Windowing precomputes the number of tokens the decoder should generate from each window (based on training corpus statistics); in Dynamic Windowing the decoder learns to emit a token that signals encoder's shift to the next input window. Empirical results render our models effective in their intended use-case: summarizing long texts with relevant content not bound to the very document beginning.

[1]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[2]  Weijia Jia,et al.  Improving Abstractive Document Summarization with Salient Information Modeling , 2019, ACL.

[3]  Yejin Choi,et al.  Deep Communicating Agents for Abstractive Summarization , 2018, NAACL.

[4]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.

[5]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[6]  William Yang Wang,et al.  WikiHow: A Large Scale Text Summarization Dataset , 2018, ArXiv.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[9]  Xiaojun Wan,et al.  Abstractive Document Summarization with a Graph-Based Attentional Neural Model , 2017, ACL.

[10]  Vitalii Zhelezniak,et al.  Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors , 2019, ICLR.

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Hiroya Takamura,et al.  Global Optimization under Length Constraint for Neural Text Summarization , 2019, ACL.

[13]  Franck Dernoncourt,et al.  A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents , 2018, NAACL.

[14]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[15]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Bowen Zhou,et al.  Sequence-to-Sequence RNNs for Text Summarization , 2016, ArXiv.