Pretraining Without Attention
暂无分享,去创建一个
[1] Danqi Chen,et al. Should You Mask 15% in Masked Language Modeling? , 2022, EACL.
[2] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ACM Comput. Surv..
[3] Khaled Kamal Saab,et al. Hungry Hungry Hippos: Towards Language Modeling with State Space Models , 2022, ICLR.
[4] Luke Zettlemoyer,et al. Mega: Moving Average Equipped Gated Attention , 2022, ICLR.
[5] Scott W. Linderman,et al. Simplified State Space Layers for Sequence Modeling , 2022, ICLR.
[6] Behnam Neyshabur,et al. Long Range Language Modeling via Gated State Spaces , 2022, ICLR.
[7] Christopher Ré,et al. On the Parameterization and Initialization of Diagonal State Space Models , 2022, NeurIPS.
[8] Jonathan Berant,et al. Diagonal State Spaces are as Effective as Structured State Spaces , 2022, NeurIPS.
[9] Quoc V. Le,et al. Transformer Quality in Linear Time , 2022, ICML.
[10] Albert Gu,et al. It's Raw! Audio Generation with State-Space Models , 2022, ICML.
[11] Omer Levy,et al. SCROLLS: Standardized CompaRison Over Long Language Sequences , 2022, EMNLP.
[12] Albert Gu,et al. Efficiently Modeling Long Sequences with Structured State Spaces , 2021, ICLR.
[13] Joshua Ainslie,et al. FNet: Mixing Tokens with Fourier Transforms , 2021, NAACL.
[14] Yi Tay,et al. Are Pretrained Convolutions Better than Pretrained Transformers? , 2021, ACL.
[15] Omer Levy,et al. How to Train BERT with an Academic Budget , 2021, EMNLP.
[16] Hyung Won Chung,et al. Do Transformer Modifications Transfer Across Implementations and Applications? , 2021, EMNLP.
[17] Liu Yang,et al. Long Range Arena: A Benchmark for Efficient Transformers , 2020, ICLR.
[18] C. Ré,et al. HiPPO: Recurrent Memory with Optimal Polynomial Projections , 2020, NeurIPS.
[19] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[20] Noam Shazeer,et al. GLU Variants Improve Transformer , 2020, ArXiv.
[21] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[22] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[23] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[24] Noah A. Smith,et al. To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.
[25] Yoav Goldberg,et al. Assessing BERT's Syntactic Abilities , 2019, ArXiv.
[26] Samuel R. Bowman,et al. Linguistic Analysis of Pretrained Sentence Encoders with Acceptability Judgments , 2019 .
[27] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.
[28] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[29] Tal Linzen,et al. Targeted Syntactic Evaluation of Language Models , 2018, EMNLP.
[30] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[31] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[32] Richard Socher,et al. Learned in Translation: Contextualized Word Vectors , 2017, NIPS.
[33] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[34] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.
[35] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[36] Emmanuel Dupoux,et al. Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.
[37] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[38] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.
[39] Yi Yang,et al. WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.