暂无分享,去创建一个
Gao Huang | Yunhai Tong | Jing Yu | Yaming Yang | Jiangang Bai | Mingliang Zhang | Yujing Wang | Ce Zhang | Jing Bai | Ce Zhang | Gao Huang | J. Yu | Yunhai Tong | Yujing Wang | Jing Bai | Yaming Yang | Jiangang Bai | Mingliang Zhang
[1] Rico Sennrich,et al. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures , 2018, EMNLP.
[2] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[4] Dustin Tran,et al. Image Transformer , 2018, ICML.
[5] Byron C. Wallace,et al. ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.
[6] Quoc V. Le,et al. Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[7] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[8] Jörg Tiedemann,et al. An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.
[9] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[10] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.
[11] Quoc V. Le,et al. The Evolved Transformer , 2019, ICML.
[12] Ashish Vaswani,et al. Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.
[13] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[16] Zhijian Liu,et al. Lite Transformer with Long-Short Range Attention , 2020, ICLR.
[17] Yi Tay,et al. Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.
[18] Wenhu Chen,et al. Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.
[19] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[20] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.
[21] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[23] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Yuval Pinter,et al. Attention is not not Explanation , 2019, EMNLP.
[25] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[26] Chris Quirk,et al. Novel positional encodings to enable tree-based transformers , 2019, NeurIPS.
[27] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[29] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.