暂无分享,去创建一个
[1] Matthijs Douze,et al. LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[2] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[3] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[4] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.
[5] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[6] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Fengwei Yu,et al. Incorporating Convolution Designs into Visual Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[11] Ashish Vaswani,et al. Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.
[12] Matthieu Cord,et al. Going deeper with Image Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Irwan Bello. LambdaNetworks: Modeling Long-Range Interactions Without Attention , 2021, ICLR.
[14] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[15] Pieter Abbeel,et al. Bottleneck Transformers for Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Fahad Shahbaz Khan,et al. Transformers in Vision: A Survey , 2021, ACM Comput. Surv..
[17] Shuai Yi,et al. Efficient Attention: Attention with Linear Complexities , 2018, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[18] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[20] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[21] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[22] Makoto Yamada,et al. Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel , 2019, EMNLP/IJCNLP.
[23] K. Simonyan,et al. High-Performance Large-Scale Image Recognition Without Normalization , 2021, ICML.
[24] Ling Shao,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, ArXiv.
[25] Alexander Kolesnikov,et al. Scaling Vision Transformers , 2021, ArXiv.
[26] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[27] Jonathon Shlens,et al. Scaling Local Self-Attention for Parameter Efficient Visual Backbones , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[29] Quoc V. Le,et al. Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[30] N. Codella,et al. CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[31] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[32] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[33] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[34] Guokun Lai,et al. Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing , 2020, NeurIPS.
[35] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[36] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[37] Xiaojie Jin,et al. DeepViT: Towards Deeper Vision Transformer , 2021, ArXiv.
[38] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[39] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[40] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] D. Tao,et al. A Survey on Visual Transformer , 2020, ArXiv.
[42] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[43] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[44] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[45] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[46] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[48] Mingxing Tan,et al. EfficientNetV2: Smaller Models and Faster Training , 2021, ICML.
[49] Shuicheng Yan,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.
[50] Taco S. Cohen,et al. A Data and Compute Efficient Design for Limited-Resources Deep Learning , 2020, ArXiv.