暂无分享,去创建一个
Anima Anandkumar | Tom Goldstein | Chen Zhu | Chaowei Xiao | Mohammad Shoeybi | Bryan Catanzaro | Wei Ping | M. Shoeybi | Bryan Catanzaro | Anima Anandkumar | T. Goldstein | Chen Zhu | Wei Ping | Chaowei Xiao
[1] Aurko Roy,et al. Efficient Content-Based Sparse Attention with Routing Transformers , 2021, TACL.
[2] Dilin Wang,et al. Improve Vision Transformers Training by Suppressing Over-smoothing , 2021, ArXiv.
[3] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[4] Lu Yuan,et al. Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding , 2021, ArXiv.
[5] Liwei Wang,et al. On Layer Normalization in the Transformer Architecture , 2020, ICML.
[6] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[7] Yee Whye Teh,et al. Set Transformer , 2018, ICML.
[8] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[9] Glenn M. Fung,et al. Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention , 2021, AAAI.
[10] Lei Huang,et al. Centered Weight Normalization in Accelerating Training of Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[11] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[12] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.
[13] Jonathon Shlens,et al. Scaling Local Self-Attention for Parameter Efficient Visual Backbones , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Noah Constant,et al. Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.
[15] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[16] D. Song,et al. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[17] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[18] Ilya Sutskever,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[19] Jes'us Villalba,et al. Hierarchical Transformers for Long Document Classification , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[20] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[21] Omer Levy,et al. Blockwise Self-Attention for Long Document Understanding , 2020, EMNLP.
[22] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[23] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[24] Dustin Tran,et al. Image Transformer , 2018, ICML.
[25] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[27] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[28] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[29] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.
[30] Zheng Zhang,et al. BP-Transformer: Modelling Long-Range Context via Binary Partitioning , 2019, ArXiv.
[31] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.
[32] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[33] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.
[34] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[35] Samuel R. Bowman,et al. ListOps: A Diagnostic Dataset for Latent Tree Learning , 2018, NAACL.
[36] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[37] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Shuicheng Yan,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.
[39] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[40] Ling Shao,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, ArXiv.
[41] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[42] Zhen Qin,et al. OmniNet: Omnidirectional Representations from Transformers , 2021, ICML.
[43] Timothy P. Lillicrap,et al. Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.
[44] N. Codella,et al. CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[45] Tim Salimans,et al. Axial Attention in Multidimensional Transformers , 2019, ArXiv.
[46] Andrew Zisserman,et al. Perceiver: General Perception with Iterative Attention , 2021, ICML.
[47] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[48] Li Yang,et al. ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.
[49] Kaitao Song,et al. PVTv2: Improved Baselines with Pyramid Vision Transformer , 2021, ArXiv.
[50] Roy Schwartz,et al. Random Feature Attention , 2021, ICLR.
[51] Aleksander Madry,et al. Noise or Signal: The Role of Image Backgrounds in Object Recognition , 2020, ICLR.
[52] Dawn Song,et al. Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.
[54] Dragomir R. Radev,et al. The ACL anthology network corpus , 2009, Language Resources and Evaluation.
[55] Ali Razavi,et al. Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.
[56] Liu Yang,et al. Long Range Arena: A Benchmark for Efficient Transformers , 2020, ICLR.
[57] Liu Yang,et al. Sparse Sinkhorn Attention , 2020, ICML.
[58] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[59] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[60] Xiaohua Zhai,et al. Are we done with ImageNet? , 2020, ArXiv.
[61] Zhijian Liu,et al. Lite Transformer with Long-Short Range Attention , 2020, ICLR.
[62] Yi Tay,et al. Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.