暂无分享,去创建一个
Weiming Dong | Xing Sun | Kekai Sheng | Changsheng Xu | Ke Li | Liqing Zhang | Zhijie Zhang | Yifan Xu | Mengdan Zhang | Weiming Dong | Liqing Zhang | Mengdan Zhang | Xing Sun | Ke Li | Changsheng Xu | Kekai Sheng | Yifan Xu | Zhijie Zhang
[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[2] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[3] Zilong Huang,et al. Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer , 2021, ArXiv.
[4] N. Codella,et al. CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[5] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ArXiv.
[6] Dacheng Tao,et al. Patch Slimming for Efficient Vision Transformers , 2021, ArXiv.
[7] Willem Zuidema,et al. Quantifying Attention Flow in Transformers , 2020, ACL.
[8] Lior Wolf,et al. Transformer Interpretability Beyond Attention Visualization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Alexander Kolesnikov,et al. MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.
[10] Matthijs Douze,et al. LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[11] Ling Shao,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, ArXiv.
[12] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[14] Tao Xiang,et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Aude Oliva,et al. IA-RED2: Interpretability-Aware Redundancy Reduction for Vision Transformers , 2021, NeurIPS.
[16] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[17] Gao Huang,et al. Evolving Attention with Residual Convolutions , 2021, ICML.
[18] Yi Tay,et al. Synthesizer: Rethinking Self-Attention for Transformer Models , 2020, ICML.
[19] Zhe Gan,et al. Chasing Sparsity in Vision Transformers: An End-to-End Exploration , 2021, NeurIPS.
[20] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[21] Fengwei Yu,et al. Incorporating Convolution Designs into Visual Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[22] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.
[23] Weiming Dong,et al. Transformers in computational visual media: A survey , 2021, Computational Visual Media.
[24] Seong Joon Oh,et al. Rethinking Spatial Dimensions of Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[25] Xiaojie Jin,et al. Refiner: Refining Self-attention for Vision Transformers , 2021, ArXiv.
[26] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.
[27] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[28] Xiaojie Jin,et al. Token Labeling: Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet , 2021, ArXiv.
[29] Kurt Keutzer,et al. Learned Token Pruning for Transformers , 2021, ArXiv.
[30] Fahad Shahbaz Khan,et al. Transformers in Vision: A Survey , 2021, ACM Comput. Surv..
[31] Levent Sagun,et al. ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases , 2021, ICML.
[32] Shuicheng Yan,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.
[33] Dahua Lin,et al. Vision Transformer with Progressive Sampling , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[34] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[35] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[36] Geoffrey E. Hinton,et al. Similarity of Neural Network Representations Revisited , 2019, ICML.
[37] D. Tao,et al. A Survey on Visual Transformer , 2020, ArXiv.
[38] Jiwen Lu,et al. DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification , 2021, NeurIPS.
[39] Zhuowen Tu,et al. Co-Scale Conv-Attentional Image Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[40] Ruining He,et al. RealFormer: Transformer Likes Residual Attention , 2020, FINDINGS.
[41] Parham Aarabi,et al. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).