Rethinking the Design Principles of Robust Vision Transformer
暂无分享,去创建一个
Yuan He | Hui Xue | Shaokai Ye | Yuefeng Chen | Gege Qi | Xiaofeng Mao | Xiaodan Li | Hui Xue | Yuan He | Shaokai Ye | Xiaofeng Mao | Gege Qi | Yuefeng Chen | Xiaodan Li
[1] Luc Van Gool,et al. LocalViT: Bringing Locality to Vision Transformers , 2021, ArXiv.
[2] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Seong Joon Oh,et al. Rethinking Spatial Dimensions of Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[4] Tie-Yan Liu,et al. Rethinking Positional Encoding in Language Pre-training , 2020, ICLR.
[5] Kaiming He,et al. Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[7] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[8] Matthijs Douze,et al. LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Dawn Song,et al. Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Laura Leal-Taixe,et al. TrackFormer: Multi-Object Tracking with Transformers , 2021, ArXiv.
[12] Dmytro Mishkin,et al. Kornia: an Open Source Differentiable Computer Vision Library for PyTorch , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[13] Pichao Wang,et al. TransReID: Transformer-based Object Re-Identification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[14] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[15] Balaji Lakshminarayanan,et al. AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty , 2020, ICLR.
[16] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[18] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[19] Enhua Wu,et al. Transformer in Transformer , 2021, NeurIPS.
[20] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[21] Levent Sagun,et al. ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases , 2021, ICML.
[22] Richard Zhang,et al. Making Convolutional Networks Shift-Invariant Again , 2019, ICML.
[23] Fengwei Yu,et al. Incorporating Convolution Designs into Visual Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[24] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.
[25] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[26] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[27] Shuicheng Yan,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.
[28] Ling Shao,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, ArXiv.
[29] Amnon Shashua,et al. Inductive Bias of Deep Convolutional Networks through Pooling Geometry , 2016, ICLR.
[30] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.
[31] Eric P. Xing,et al. Learning Robust Global Representations by Penalizing Local Predictive Power , 2019, NeurIPS.
[32] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[33] Bo Zhang,et al. Do We Really Need Explicit Position Encodings for Vision Transformers? , 2021, ArXiv.
[34] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[35] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[36] D. Song,et al. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[37] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[38] Matthias Bethge,et al. A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions , 2020, ECCV.