暂无分享,去创建一个
Kaitao Song | Ding Liang | Enze Xie | Deng-Ping Fan | Ping Luo | Wenhai Wang | Ling Shao | Xiang Li | Tong Lu
[1] Yi Jiang,et al. Sparse R-CNN: End-to-End Object Detection with Learnable Proposals , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[3] Quanfu Fan,et al. CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[4] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[5] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[6] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[7] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[8] Kai Chen,et al. MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.
[9] Shuicheng Yan,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.
[10] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[11] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[12] Ling Shao,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, ArXiv.
[13] Luc Van Gool,et al. LocalViT: Bringing Locality to Vision Transformers , 2021, ArXiv.
[14] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[16] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[17] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[18] Kaiming He,et al. Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] N. Codella,et al. CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[20] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Enhua Wu,et al. Transformer in Transformer , 2021, NeurIPS.
[22] Jun Li,et al. Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection , 2020, NeurIPS.
[23] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[24] Zhuowen Tu,et al. Co-Scale Conv-Attentional Image Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[25] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[26] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[27] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[29] Chunhua Shen,et al. Twins: Revisiting the Design of Spatial Attention in Vision Transformers , 2021, NeurIPS.
[30] Shifeng Zhang,et al. Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Matthijs Douze,et al. LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[32] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Yi Yang,et al. Random Erasing Data Augmentation , 2017, AAAI.
[34] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[35] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[36] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[37] Sen Jia,et al. How Much Position Information Do Convolutional Neural Networks Encode? , 2020, ICLR.
[38] Nuno Vasconcelos,et al. Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.