FLatten Transformer: Vision Transformer using Focused Linear Attention
暂无分享,去创建一个
[1] S. Song,et al. Dynamic Perceiver for Efficient Visual Recognition , 2023, ArXiv.
[2] S. Song,et al. Adaptive Rotated Convolution for Rotated Object Detection , 2023, ArXiv.
[3] Yu Cao,et al. Deep Incubation: Training Large Models by Divide-and-Conquering , 2022, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[4] Bichen Wu,et al. Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] S. Song,et al. EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones , 2022, ArXiv.
[6] S. Song,et al. Contrastive Language-Image Pre-Training with Knowledge Graphs , 2022, NeurIPS.
[7] S. Song,et al. Latency-aware Spatial-wise Dynamic Networks , 2022, NeurIPS.
[8] S. Song,et al. Learning to Weight Samples for Dynamic Early-exiting Networks , 2022, ECCV.
[9] Cheng-Yang Fu,et al. Hydra Attention: Efficient Attention with Many Heads , 2022, ECCV Workshops.
[10] Zhirong Yang,et al. Paramixer: Parameterizing Mixing Links in Sparse Factors Works Better than Dot-Product Self-Attention , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Humphrey Shi,et al. Neighborhood Attention Transformer , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Ross B. Girshick,et al. Exploring Plain Vision Transformer Backbones for Object Detection , 2022, ECCV.
[13] Junjie Yan,et al. cosFormer: Rethinking Softmax in Attention , 2022, ICLR.
[14] S. Song,et al. Vision Transformer with Deformable Attention , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] S. Song,et al. On the Integration of Self-Attention and Convolution , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] T. Xiang,et al. SOFT: Softmax-free Transformer with Linear Complexity , 2021, NeurIPS.
[17] Mohammad Rastegari,et al. MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer , 2021, ICLR.
[18] Lu Yuan,et al. Mobile-Former: Bridging MobileNet and Transformer , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Kai Han,et al. CMT: Convolutional Neural Networks Meet Vision Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Alexander G. Schwing,et al. Per-Pixel Classification is Not All You Need for Semantic Segmentation , 2021, NeurIPS.
[21] Jure Leskovec,et al. Combiner: Full Attention Transformer with Sparse Computation Cost , 2021, NeurIPS.
[22] Nenghai Yu,et al. CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Trevor Darrell,et al. Early Convolutions Help Transformers See Better , 2021, NeurIPS.
[24] P. Luo,et al. PVT v2: Improved baselines with Pyramid Vision Transformer , 2021, Computational Visual Media.
[25] Shuicheng Yan,et al. VOLO: Vision Outlooker for Visual Recognition , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[26] Luke Zettlemoyer,et al. Luna: Linear Unified Nested Attention , 2021, NeurIPS.
[27] Anima Anandkumar,et al. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.
[28] Zeyi Huang,et al. Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition , 2021, NeurIPS.
[29] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[30] Xiang Li,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[31] Gao Huang,et al. Dynamic Neural Networks: A Survey , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[32] Glenn M. Fung,et al. Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention , 2021, AAAI.
[33] Francis E. H. Tay,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[34] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[35] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[36] Le Yang,et al. Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification , 2020, NeurIPS.
[37] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.
[38] Lucy J. Colwell,et al. Rethinking Attention with Performers , 2020, ICLR.
[39] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[40] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[41] Le Yang,et al. Resolution Adaptive Networks for Efficient Inference , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[43] Xuancheng Ren,et al. Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection , 2019, ArXiv.
[44] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[45] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[46] Kaiming He,et al. Panoptic Feature Pyramid Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Shuai Yi,et al. Efficient Attention: Attention with Linear Complexities , 2018, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[48] Yuning Jiang,et al. Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.
[49] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[50] Nuno Vasconcelos,et al. Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[51] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[52] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[53] Yi Yang,et al. Random Erasing Data Augmentation , 2017, AAAI.
[54] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[55] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[56] Bolei Zhou,et al. Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.
[57] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[58] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[59] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[60] Han Cai,et al. EfficientViT: Enhanced Linear Attention for High-Resolution Low-Computation Visual Recognition , 2022, ArXiv.
[61] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).