暂无分享,去创建一个
Kai Han | Yehui Tang | Chang Xu | Chang Xu | Jianyuan Guo | Chao Xu | Yunhe Wang | Xinghao Chen | Han Wu | Chao Xu | Kai Han | Yunhe Wang | Jianyuan Guo | Yehui Tang | Xinghao Chen | Han Wu
[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[2] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[4] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[5] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[6] Kaiming He,et al. Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[8] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[9] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[10] Shuicheng Yan,et al. Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] Nuno Vasconcelos,et al. Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[12] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] D. Tao,et al. A Survey on Visual Transformer , 2020, ArXiv.
[14] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[15] Enhua Wu,et al. Transformer in Transformer , 2021, NeurIPS.
[16] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[17] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[18] Kaiming He,et al. Panoptic Feature Pyramid Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[20] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[21] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[22] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[23] Quoc V. Le,et al. Pay Attention to MLPs , 2021, NeurIPS.
[24] Xing Sun,et al. AS-MLP: An Axial Shifted MLP Architecture for Vision , 2021, ArXiv.
[25] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[26] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[27] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[28] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[29] Bolei Zhou,et al. Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.
[30] Alexander Kolesnikov,et al. MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.
[31] Ling Shao,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, ArXiv.
[32] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[33] Chunhua Shen,et al. Twins: Revisiting Spatial Attention Design in Vision Transformers , 2021, ArXiv.
[34] Yunfeng Cai,et al. S2-MLP: Spatial-Shift MLP Architecture for Vision , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
[35] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.
[36] Kaitao Song,et al. PVTv2: Improved Baselines with Pyramid Vision Transformer , 2021, ArXiv.
[37] Jiwen Lu,et al. Global Filter Networks for Image Classification , 2021, NeurIPS.
[38] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[39] Ding Liang,et al. CycleMLP: A MLP-like Architecture for Dense Prediction , 2021, ArXiv.
[40] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[41] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[42] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[43] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[45] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[46] Matthieu Cord,et al. ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[47] N. Codella,et al. CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[48] Xinggang Wang,et al. What Makes for Hierarchical Vision Transformer? , 2021, ArXiv.
[49] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[50] Yunfeng Cai,et al. S2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision , 2021, ArXiv.
[51] Trevor Darrell,et al. Early Convolutions Help Transformers See Better , 2021, NeurIPS.
[52] Kai Chen,et al. MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.
[53] Shuicheng Yan,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.
[54] Yi Yang,et al. Random Erasing Data Augmentation , 2017, AAAI.
[55] Yuning Jiang,et al. Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.
[56] Fengwei Yu,et al. Incorporating Convolution Designs into Visual Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).