Fastformer: Additive Attention is All You Need
暂无分享,去创建一个
[1] Alexander J. Smola,et al. Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS) , 2014, KDD.
[2] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[3] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[4] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.
[5] Jakob Uszkoreit,et al. A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.
[6] Julian J. McAuley,et al. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.
[7] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[8] Tat-Seng Chua,et al. Item Silk Road: Recommending Items from Information Domains to Social Users , 2017, SIGIR.
[9] Franck Dernoncourt,et al. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents , 2018, NAACL.
[10] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[11] Suyu Ge,et al. Neural News Recommendation with Multi-Head Self-Attention , 2019, EMNLP.
[12] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[13] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[14] M. Zaheer,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[15] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[16] Lucy J. Colwell,et al. Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers , 2020, ArXiv.
[17] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[18] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[19] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[20] Xing Xie,et al. Fine-grained Interest Matching for Neural News Recommendation , 2020, ACL.
[21] Xing Xie,et al. MIND: A Large-scale Dataset for News Recommendation , 2020, ACL.
[22] Jianfeng Gao,et al. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.
[23] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[24] Jiancheng Lv,et al. Poolingformer: Long Document Modeling with Pooling Attention , 2021, ICML.
[25] Yi Tay,et al. Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.
[26] Tao Qi,et al. Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling , 2021, ACL.
[27] Tao Qi,et al. Empowering News Recommendation with Pre-trained Language Models , 2021, SIGIR.
[28] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ACM Comput. Surv..