Synthesizer: Rethinking Self-Attention in Transformer Models
暂无分享,去创建一个
Yi Tay | Donald Metzler | Zhe Zhao | Dara Bahri | Da-Cheng Juan | Che Zheng
[1] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.
[2] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[3] Liu Yang,et al. Sparse Sinkhorn Attention , 2020, ICML.
[4] J. Tiedemann,et al. Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation , 2020, FINDINGS.
[5] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[6] Martin Jaggi,et al. On the Relationship between Self-Attention and Convolutional Layers , 2019, ICLR.
[7] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[8] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.
[9] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[10] Felix Wu,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.
[11] Dustin Tran,et al. Mesh-TensorFlow: Deep Learning for Supercomputers , 2018, NeurIPS.
[12] Andrew M. Dai,et al. Music Transformer: Generating Music with Long-Term Structure , 2018, ICLR.
[13] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[14] Ali Farhadi,et al. Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension , 2018, EMNLP.
[15] Samuel R. Bowman,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[16] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[17] Jason Weston,et al. Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.
[18] Ming Zhou,et al. Gated Self-Matching Networks for Reading Comprehension and Question Answering , 2017, ACL.
[19] Hannes Schulz,et al. Relevance of Unsupervised Metrics in Task-Oriented Dialogue for Evaluating Natural Language Generation , 2017, ArXiv.
[20] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[21] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[22] Jakob Uszkoreit,et al. A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.
[23] Mirella Lapata,et al. Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.
[24] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[25] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.
[26] Jason Weston,et al. Memory Networks , 2014, ICLR.
[27] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[28] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.