Global Context Vision Transformers
暂无分享,去创建一个
[1] L. Dudziak,et al. EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers , 2022, ECCV.
[2] P. Milanfar,et al. MaxViT: Multi-Axis Vision Transformer , 2022, ECCV.
[3] H. Shum,et al. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection , 2022, ICLR.
[4] J. Zico Kolter,et al. Patches Are All You Need? , 2022, Trans. Mach. Learn. Res..
[5] Trevor Darrell,et al. A ConvNet for the 2020s , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] J. Álvarez,et al. A-ViT: Adaptive Tokens for Efficient Vision Transformer , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Shuicheng Yan,et al. MetaFormer is Actually What You Need for Vision , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Li Dong,et al. Swin Transformer V2: Scaling Up Capacity and Resolution , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] A. Dosovitskiy,et al. Do Vision Transformers See Like Convolutional Neural Networks? , 2021, NeurIPS.
[10] Nenghai Yu,et al. CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] P. Luo,et al. PVT v2: Improved baselines with Pyramid Vision Transformer , 2021, Computational Visual Media.
[12] Matthijs Douze,et al. XCiT: Cross-Covariance Image Transformers , 2021, NeurIPS.
[13] Quoc V. Le,et al. CoAtNet: Marrying Convolution and Attention for All Data Sizes , 2021, NeurIPS.
[14] Matthieu Cord,et al. ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[15] Chunhua Shen,et al. Twins: Revisiting the Design of Spatial Attention in Vision Transformers , 2021, NeurIPS.
[16] Matthijs Douze,et al. LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[17] Mingxing Tan,et al. EfficientNetV2: Smaller Models and Faster Training , 2021, ICML.
[18] N. Codella,et al. CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[19] Quanfu Fan,et al. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[20] Xiang Li,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[21] Chunhua Shen,et al. Conditional Positional Encodings for Vision Transformers , 2021, ICLR.
[22] Francis E. H. Tay,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[24] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[25] M. Zaheer,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[26] Kaiming He,et al. Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Kai Chen,et al. MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.
[28] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.
[29] Yuning Jiang,et al. Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.
[30] Yichen Wei,et al. Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[31] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[32] Gang Sun,et al. Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[33] Bolei Zhou,et al. Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[35] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[36] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[37] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Ramprasaath R. Selvaraju,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, International Journal of Computer Vision.
[39] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016, 1606.08415.
[41] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[44] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[45] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[46] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[47] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[48] Chunyuan Li,et al. Focal Attention for Long-Range Interactions in Vision Transformers , 2021, NeurIPS.
[49] Pavlo Molchanov,et al. NViT: Vision Transformer Compression and Parameter Redistribution , 2021, ArXiv.
[50] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).