SparseBERT: Rethinking the Importance Analysis in Self-attention
暂无分享,去创建一个
James T. Kwok | Zhenguo Li | Xiaodan Liang | Hang Xu | Han Shi | Jiahui Gao | Xiaozhe Ren | J. Kwok | Xiaozhe Ren | Xiaodan Liang | Zhenguo Li | Hang Xu | Han Shi | Jiahui Gao
[1] Myle Ott,et al. Scaling Neural Machine Translation , 2018, WMT.
[2] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[3] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[4] Ankit Singh Rawat,et al. Are Transformers universal approximators of sequence-to-sequence functions? , 2020, ICLR.
[5] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[6] Jaegul Choo,et al. SANVis: Visual Analytics for Understanding Self-Attention Networks , 2019, 2019 IEEE Visualization Conference (VIS).
[7] Frank Hutter,et al. Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..
[8] Yejin Choi,et al. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.
[9] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.
[10] Hongbo Zhang,et al. Quora Question Pairs , 2017 .
[11] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[12] Omer Levy,et al. Blockwise Self-Attention for Long Document Understanding , 2020, EMNLP.
[13] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[14] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[15] Anna Rumshisky,et al. Revealing the Dark Secrets of BERT , 2019, EMNLP.
[16] Ido Dagan,et al. The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.
[17] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[18] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[19] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[20] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[21] Longhui Wei,et al. GOLD-NAS: Gradual, One-Level, Differentiable , 2020, ArXiv.
[22] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[23] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[24] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.
[25] Liang Lin,et al. SNAS: Stochastic Neural Architecture Search , 2018, ICLR.
[26] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[27] Yee Whye Teh,et al. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.
[28] Luke S. Zettlemoyer,et al. Transformers with convolutional context for ASR , 2019, ArXiv.
[29] Wenhu Chen,et al. Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.
[30] Zheng Zhang,et al. Star-Transformer , 2019, NAACL.
[31] Sashank J. Reddi,et al. $O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers , 2020, NeurIPS.
[32] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[33] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[34] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[35] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.
[36] Di He,et al. Efficient Training of BERT by Progressively Stacking , 2019, ICML.
[37] Eneko Agirre,et al. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.
[38] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[39] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.
[40] Wei Wang,et al. Multi-Granularity Hierarchical Attention Fusion Networks for Reading Comprehension and Question Answering , 2018, ACL.