暂无分享,去创建一个
Song Bai | Philip H. S. Torr | Philip Torr | Shuyang Sun | Xiaoyu Yue | S. Bai | Shuyang Sun | Xiaoyu Yue
[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[2] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Yichen Wei,et al. Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[4] Enhua Wu,et al. Transformer in Transformer , 2021, NeurIPS.
[5] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[6] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[7] N. Codella,et al. CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Tim Salimans,et al. Axial Attention in Multidimensional Transformers , 2019, ArXiv.
[9] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[10] Bo Zhang,et al. Do We Really Need Explicit Position Encodings for Vision Transformers? , 2021, ArXiv.
[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[12] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.
[13] Kurt Keutzer,et al. Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.
[14] Irwan Bello. LambdaNetworks: Modeling Long-Range Interactions Without Attention , 2021, ICLR.
[15] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[16] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[17] Shuai Yi,et al. FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction , 2019, NeurIPS.
[18] Yann Dauphin,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.
[19] Levent Sagun,et al. ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases , 2021, ICML.
[20] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[21] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[22] Geoffrey E. Hinton,et al. How to Represent Part-Whole Hierarchies in a Neural Network , 2021, Neural Computation.
[23] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.
[24] Geoffrey E. Hinton,et al. Dynamic Routing Between Capsules , 2017, NIPS.
[25] K. Simonyan,et al. High-Performance Large-Scale Image Recognition Without Normalization , 2021, ICML.
[26] Ling Shao,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, ArXiv.
[27] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[28] Martin Jaggi,et al. On the Relationship between Self-Attention and Convolutional Layers , 2019, ICLR.
[29] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[30] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[31] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[33] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[34] Nitish Srivastava,et al. Capsules with Inverted Dot-Product Attention Routing , 2020, ICLR.
[35] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.
[36] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[37] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Geoffrey E. Hinton,et al. Matrix capsules with EM routing , 2018, ICLR.
[39] Shuicheng Yan,et al. A2-Nets: Double Attention Networks , 2018, NeurIPS.
[40] Ashish Vaswani,et al. Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.
[41] Kai Chen,et al. MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.
[42] Yee Whye Teh,et al. Stacked Capsule Autoencoders , 2019, NeurIPS.
[43] Roberto Cipolla,et al. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[44] Shuicheng Yan,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.
[45] Nuno Vasconcelos,et al. Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[46] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[47] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[48] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[49] Luc Van Gool,et al. Dynamic Filter Networks , 2016, NIPS.
[50] Geoffrey E. Hinton. Some Demonstrations of the Effects of Structural Descriptions in Mental Imagery , 1979, Cogn. Sci..
[51] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[52] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[53] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[54] Jonathon Shlens,et al. Scaling Local Self-Attention for Parameter Efficient Visual Backbones , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[56] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[57] Georg Heigold,et al. Object-Centric Learning with Slot Attention , 2020, NeurIPS.
[58] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[59] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[61] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[62] Seong Joon Oh,et al. Rethinking Spatial Dimensions of Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[63] D. Kahneman,et al. The reviewing of object files: Object-specific integration of information , 1992, Cognitive Psychology.
[64] Quoc V. Le,et al. Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[65] Shuicheng Yan,et al. Graph-Based Global Reasoning Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[66] Zhuowen Tu,et al. Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.
[67] Vladlen Koltun,et al. Exploring Self-Attention for Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[68] Pieter Abbeel,et al. Bottleneck Transformers for Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[69] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[70] Christoph Feichtenhofer,et al. Multiscale Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[71] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[72] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[73] Kaiming He,et al. Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).