Efficient Attentions for Long Document Summarization

The quadratic computational and memory complexities of large Transformers have limited their scalability for long document summarization. In this paper, we propose Hepos, a novel efficient encoder-decoder attention with head-wise positional strides to effectively pinpoint salient information from the source. We further conduct a systematic study of existing efficient self-attentions. Combined with Hepos, we are able to process ten times more tokens than existing models that use full attentions. For evaluation, we present a new dataset, GovReport, with significantly longer documents and summaries. Results show that our models produce significantly higher ROUGE scores than competitive comparisons, including new state-of-the-art results on PubMed. Human evaluation also shows that our models generate more informative summaries with fewer unfaithful errors.

[1]  Anastassia Kornilova,et al.  BillSum: A Corpus for Automatic Summarization of US Legislation , 2019, EMNLP.

[2]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[3]  Lu Wang,et al.  BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization , 2019, ACL.

[4]  Yao Zhao,et al.  SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization , 2020, ArXiv.

[5]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[6]  Grigorios Tsoumakas,et al.  A Divide-and-Conquer Approach to the Summarization of Long Documents , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Dragomir R. Radev,et al.  Scientific Paper Summarization Using Citation Summary Networks , 2008, COLING.

[8]  Yao Zhao,et al.  PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization , 2020, ICML.

[9]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[10]  Franck Dernoncourt,et al.  A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents , 2018, NAACL.

[11]  Alexander M. Rush,et al.  Bottom-Up Abstractive Summarization , 2018, EMNLP.

[12]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[13]  Fedor Moiseev,et al.  Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.

[14]  Richard Socher,et al.  Evaluating the Factual Consistency of Abstractive Text Summarization , 2019, EMNLP.

[15]  Alex Wang,et al.  Asking and Answering Questions to Evaluate the Factual Consistency of Summaries , 2020, ACL.

[16]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[17]  Michael Elhadad,et al.  Question Answering as an Automatic Evaluation Metric for News Article Summarization , 2019, NAACL.

[18]  Mor Naaman,et al.  Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies , 2018, NAACL.

[19]  Yejin Choi,et al.  Deep Communicating Agents for Abstractive Summarization , 2018, NAACL.

[20]  S. Lewis,et al.  Regression analysis , 2007, Practical Neurology.

[21]  Sandeep Subramanian,et al.  On Extractive and Abstractive Neural Document Summarization with Transformer Language Models , 2020, EMNLP.

[22]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[23]  Giuseppe Carenini,et al.  Extractive Summarization of Long Documents by Combining Global and Local Context , 2019, EMNLP.

[24]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[25]  Robert Leaman,et al.  PubTator central: automated concept annotation for biomedical full text articles , 2019, Nucleic Acids Res..

[26]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[27]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[28]  Mona T. Diab,et al.  FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization , 2020, ACL.

[29]  Ryan McDonald,et al.  On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.

[30]  Liu Yang,et al.  Sparse Sinkhorn Attention , 2020, ICML.

[31]  Mirella Lapata,et al.  Movie Script Summarization as Graph-based Scene Extraction , 2015, NAACL.

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33]  Rada Mihalcea,et al.  Explorations in Automatic Book Summarization , 2007, EMNLP.

[34]  Amy J. C. Trappey,et al.  Automatic patent document summarization for collaborative knowledge systems and services , 2009 .

[35]  Noam Shazeer,et al.  Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.

[36]  Edouard Grave,et al.  Adaptive Attention Span in Transformers , 2019, ACL.

[37]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[38]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[39]  Yue Dong,et al.  HipoRank: Incorporating Hierarchical and Positional Information into Graph-based Unsupervised Long Document Extractive Summarization , 2020, ArXiv.

[40]  Nikolaos Pappas,et al.  Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.

[41]  Yi Tay,et al.  Efficient Transformers: A Survey , 2020, ArXiv.

[42]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[43]  Li Yang,et al.  Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[44]  Lukasz Kaiser,et al.  Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.

[45]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.

[46]  Ying Lin,et al.  A Joint Neural Model for Information Extraction with Global Features , 2020, ACL.

[47]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[48]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.