暂无分享,去创建一个
[1] Seong Joon Oh,et al. Rethinking Spatial Dimensions of Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[2] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[3] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[4] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[5] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[7] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.
[8] Quoc V. Le,et al. Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[11] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[12] N. Codella,et al. CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Shuicheng Yan,et al. VOLO: Vision Outlooker for Visual Recognition , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[14] Zhuowen Tu,et al. Co-Scale Conv-Attentional Image Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[15] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[16] Matthijs Douze,et al. XCiT: Cross-Covariance Image Transformers , 2021, NeurIPS.
[17] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[18] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Fengwei Yu,et al. Incorporating Convolution Designs into Visual Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[21] Dacheng Tao,et al. ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias , 2021, NeurIPS.
[22] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[23] Weijian Li,et al. ConTNet: Why not use convolution and transformer at the same time? , 2021, ArXiv.
[24] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[25] Levent Sagun,et al. ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases , 2021, ICML.
[26] Shuicheng Yan,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.
[27] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[28] Luc Van Gool,et al. LocalViT: Bringing Locality to Vision Transformers , 2021, ArXiv.
[29] Kaiming He,et al. Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Alexander Kolesnikov,et al. How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers , 2021, ArXiv.
[31] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[32] Andrew Zisserman,et al. Perceiver: General Perception with Iterative Attention , 2021, ICML.
[33] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[34] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.
[35] Lu Yuan,et al. Focal Self-attention for Local-Global Interactions in Vision Transformers , 2021, ArXiv.
[36] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[37] Lu Yuan,et al. Mobile-Former: Bridging MobileNet and Transformer , 2021, ArXiv.
[38] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.