Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems
暂无分享,去创建一个
[1] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[2] R. Jones,et al. TREC 2020 Podcasts Track Overview , 2021, TREC.
[3] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ArXiv.
[4] Kentaro Inui,et al. Attention is Not Only a Weight: Analyzing Transformers with Vector Norms , 2020, EMNLP.
[5] Franck Dernoncourt,et al. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents , 2018, NAACL.
[6] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[7] Min Sun,et al. A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss , 2018, ACL.
[8] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[9] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[10] Ben Carterette,et al. 100,000 Podcasts: A Spoken English Document Corpus , 2020, COLING.
[11] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[12] Mirella Lapata,et al. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.
[13] Mark J. F. Gales,et al. Long-Span Summarization via Local Attention and Content Selection , 2021, ACL.
[14] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[15] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[16] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[17] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[18] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[19] Mark J. F. Gales,et al. Abstractive Spoken Document Summarization Using Hierarchical Model with Multi-Stage Attention Diversity Optimization , 2020, INTERSPEECH.
[20] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[21] Alexander M. Rush,et al. Bottom-Up Abstractive Summarization , 2018, EMNLP.
[22] Shuyang Cao,et al. Efficient Attentions for Long Document Summarization , 2021, NAACL.
[23] Heng Ji,et al. Keep Meeting Summaries on Topic: Abstractive Multi-Modal Meeting Summarization , 2019, ACL.
[24] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[25] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[26] Shuyang Cao,et al. Attention Head Masking for Inference Time Content Selection in Abstractive Summarization , 2021, NAACL.
[27] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.