Sparse Attention with Linear Units
暂无分享,去创建一个
[1] Xuancheng Ren,et al. Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection , 2019, ArXiv.
[2] Shikha Bordia,et al. Do Attention Heads in BERT Track Syntactic Dependencies? , 2019, ArXiv.
[3] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[4] Philipp Koehn,et al. Findings of the 2018 Conference on Machine Translation (WMT18) , 2018, WMT.
[5] John DeNero,et al. Adding Interpretable Attention to Neural Translation Models Improves Word Alignment , 2019, ArXiv.
[6] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[7] Aurko Roy,et al. Efficient Content-Based Sparse Attention with Routing Transformers , 2021, TACL.
[8] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[9] Tianqi Chen,et al. Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.
[10] Philipp Koehn,et al. Saliency-driven Word Alignment Interpretation for Neural Machine Translation , 2019, WMT.
[11] André F. T. Martins,et al. Sparse and Constrained Attention for Neural Machine Translation , 2018, ACL.
[12] Mark Fishel,et al. Confidence through Attention , 2017, MTSummit.
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Mirella Lapata,et al. Text Summarization with Pretrained Encoders , 2019, EMNLP.
[15] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[16] Vlad Niculae,et al. A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.
[17] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[18] Kentaro Inui,et al. Attention is Not Only a Weight: Analyzing Transformers with Vector Norms , 2020, EMNLP.
[19] Yang Liu,et al. Accurate Word Alignment Induction from Neural Machine Translation , 2020, EMNLP.
[20] Ramón Fernández Astudillo,et al. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.
[21] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[22] J. Tiedemann,et al. Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation , 2020, FINDINGS.
[23] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[24] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[25] André F. T. Martins,et al. Sparse Sequence-to-Sequence Models , 2019, ACL.
[26] Rico Sennrich,et al. Root Mean Square Layer Normalization , 2019, NeurIPS.
[27] Christof Monz,et al. What does Attention in Neural Machine Translation Pay Attention to? , 2017, IJCNLP.
[28] Rico Sennrich,et al. Context-Aware Neural Machine Translation Learns Anaphora Resolution , 2018, ACL.
[29] André F. T. Martins,et al. Adaptively Sparse Transformers , 2019, EMNLP.
[30] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[31] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[32] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.
[33] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[34] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.
[35] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.
[36] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[37] Karin M. Verspoor,et al. Findings of the 2016 Conference on Machine Translation , 2016, WMT.
[38] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[39] Hermann Ney,et al. Improved Statistical Alignment Models , 2000, ACL.
[40] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[41] Yuval Pinter,et al. Attention is not not Explanation , 2019, EMNLP.