暂无分享,去创建一个
Furu Wei | Yutong Lin | Han Hu | Zheng Zhang | Yue Cao | Li Dong | Baining Guo | Zhenda Xie | Zhuliang Yao | Yixuan Wei | Ze Liu | Jia Ning
[1] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.
[2] Luc Van Gool,et al. LocalViT: Bringing Locality to Vision Transformers , 2021, ArXiv.
[3] Shuicheng Yan,et al. VOLO: Vision Outlooker for Visual Recognition , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[4] Nenghai Yu,et al. CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Cheng He,et al. FaPN: Feature-aligned Pyramid Network for Dense Image Prediction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[6] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[7] Stephen Lin,et al. Local Relation Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Hongyang Chao,et al. Rethinking and Improving Relative Position Encoding for Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[10] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[11] Yichen Wei,et al. Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[12] Lu Yuan,et al. Dynamic Head: Unifying Object Detection Heads with Attentions , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Lu Yuan,et al. Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding , 2021, ArXiv.
[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Quoc V. Le,et al. CoAtNet: Marrying Convolution and Attention for All Data Sizes , 2021, NeurIPS.
[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[17] Chien-Yao Wang,et al. You Only Learn One Representation: Unified Network for Multiple Tasks , 2021, J. Inf. Sci. Eng..
[18] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[19] Kaiming He,et al. Group Normalization , 2018, ECCV.
[20] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[21] Furu Wei,et al. BEiT: BERT Pre-Training of Image Transformers , 2021, ArXiv.
[22] Tingting Liang,et al. CBNet: A Composite Backbone Network Architecture for Object Detection , 2021, IEEE Transactions on Image Processing.
[23] Klaus-Robert Müller,et al. SchNet: A continuous-filter convolutional neural network for modeling quantum interactions , 2017, NIPS.
[24] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.
[25] Zheng Zhang,et al. A Closer Look at Local Aggregation Operators in Point Cloud Analysis , 2020, ECCV.
[26] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[27] Raquel Urtasun,et al. Deep Parametric Continuous Convolutional Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[28] Jianmin Bao,et al. SimMIM: A Simple Framework for Masked Image Modeling , 2021, ArXiv.
[29] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[30] Jianfeng Gao,et al. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.
[31] Quoc V. Le,et al. Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Trevor Darrell,et al. Early Convolutions Help Transformers See Better , 2021, NeurIPS.
[33] Shuicheng Yan,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.
[34] Carlos Riquelme,et al. Scaling Vision with Sparse Mixture of Experts , 2021, NeurIPS.
[35] Jian Sun,et al. Objects365: A Large-Scale, High-Quality Dataset for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[36] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[37] Michael S. Ryoo,et al. TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? , 2021, ArXiv.
[38] Alexander G. Schwing,et al. Per-Pixel Classification is Not All You Need for Semantic Segmentation , 2021, NeurIPS.
[39] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[40] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[41] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[42] Tie-Yan Liu,et al. Rethinking Positional Encoding in Language Pre-training , 2020, ICLR.
[43] Yutong Lin,et al. Leveraging Batch Normalization for Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).
[44] Lu Yuan,et al. Focal Self-attention for Local-Global Interactions in Vision Transformers , 2021, ArXiv.
[45] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Samuel L. Smith,et al. Characterizing signal propagation to close the performance gap in unnormalized ResNets , 2021, ICLR.
[47] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[48] Zilong Huang,et al. Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer , 2021, ArXiv.
[49] Xiang Bai,et al. End-to-End Semi-Supervised Object Detection with Soft Teacher , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[50] Olatunji Ruwase,et al. ZeRO: Memory optimizations Toward Training Trillion Parameter Models , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[51] Armand Joulin,et al. Self-supervised Pretraining of Visual Features in the Wild , 2021, ArXiv.
[52] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[53] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[54] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[55] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[56] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.
[57] Bolei Zhou,et al. Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.
[58] Ling Shao,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, ArXiv.
[59] Tao Xiang,et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Lucas Beyer,et al. Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.
[61] Cordelia Schmid,et al. ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[62] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[63] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[64] Chunhua Shen,et al. Twins: Revisiting the Design of Spatial Attention in Vision Transformers , 2021, NeurIPS.