暂无分享,去创建一个
Ralph R. Martin | Shi-Min Hu | Tai-Jiang Mu | Ming-Ming Cheng | Peng-Tao Jiang | Zheng-Ning Liu | Jiang-Jiang Liu | Song-Hai Zhang | Meng-Hao Guo | Tian-Xing Xu | Ming-Ming Cheng | Shimin Hu | Tai-Jiang Mu | Zheng-Ning Liu | Peng-Tao Jiang | Song-Hai Zhang | Meng-Hao Guo | Tianhan Xu | Jiangjiang Liu | Ralph Robert Martin
[1] P. Lennie,et al. Early and Late Mechanisms of Surround Suppression in Striate Cortex of Macaque , 2005, The Journal of Neuroscience.
[2] Wenjun Zeng,et al. Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Xiaogang Wang,et al. Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Yi Zhang,et al. PSANet: Point-wise Spatial Attention Network for Scene Parsing , 2018, ECCV.
[5] Xiaohua Xie,et al. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks , 2021, ICML.
[6] Christian Poellabauer,et al. Second-Order Non-Local Attention Networks for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[7] Xiaogang Wang,et al. SCAN: Self-and-Collaborative Attention Network for Video Person Re-Identification , 2018, IEEE Transactions on Image Processing.
[8] Ronald A. Rensink. The Dynamic Representation of Scenes , 2000 .
[9] Sainan Liu,et al. Attentional ShapeContextNet for Point Cloud Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[10] Yee Whye Teh,et al. Set Transformer , 2018, ICML.
[11] Xuming He,et al. LatentGNN: Learning Efficient Non-local Relations for Visual Recognition , 2019, ICML.
[12] Shuguang Cui,et al. PointASNL: Robust Point Clouds Processing Using Nonlocal Neural Networks With Adaptive Sampling , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Cho-Jui Hsieh,et al. When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations , 2021, ArXiv.
[14] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.
[15] Jian Yang,et al. Selective Kernel Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.
[17] D. Ballard,et al. Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.
[18] Weiming Dong,et al. Transformers in computational visual media: A survey , 2021, Computational Visual Media.
[19] C. Qian,et al. TAM: Temporal Adaptive Module for Video Recognition , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[20] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[21] Matthieu Cord,et al. ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[22] Qibin Hou,et al. Rotate to Attend: Convolutional Triplet Attention Module , 2020, ArXiv.
[23] Hong Liu,et al. Expectation-Maximization Attention Networks for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[24] Deva Ramanan,et al. Attentional Pooling for Action Recognition , 2017, NIPS.
[25] Hong Liu,et al. Spatial Pyramid Based Graph Reasoning for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Hongxu Chen,et al. Is Attention Better Than Matrix Decomposition? , 2021, ICLR.
[27] Shi-Min Hu,et al. Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[28] Xiangyu Zhang,et al. Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Xiaolin Li,et al. Single Shot Text Detector with Regional Attention , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[30] Ralph R. Martin,et al. Can Attention Enable MLPs To Catch Up With CNNs? , 2021, Comput. Vis. Media.
[31] Jiashi Feng,et al. Coordinate Attention for Efficient Mobile Network Design , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Tat-Seng Chua,et al. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Loïc Le Folgoc,et al. Attention U-Net: Learning Where to Look for the Pancreas , 2018, ArXiv.
[34] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Changhu Wang,et al. Improving Convolutional Networks With Self-Calibrated Convolutions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[37] Tao Mei,et al. Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[38] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Xiaogang Wang,et al. Diversity Regularized Spatiotemporal Attention for Video-Based Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[40] Furu Wei,et al. BEiT: BERT Pre-Training of Image Transformers , 2021, ArXiv.
[41] L. Spillmann,et al. Beyond the classical receptive field: The effect of contextual stimuli. , 2015, Journal of vision.
[42] Christof Koch,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .
[43] Matthias Zwicker,et al. L2G Auto-encoder: Understanding Point Clouds by Local-to-Global Reconstruction with Hierarchical Self-Attention , 2019, ACM Multimedia.
[44] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[45] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[46] Jun Fu,et al. Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Abhinav Shrivastava,et al. GTA: Global Temporal Attention for Video Action Understanding , 2020, BMVC.
[48] Bo Zhao,et al. Diversified Visual Attention Networks for Fine-Grained Object Classification , 2016, IEEE Transactions on Multimedia.
[49] Quoc V. Le,et al. Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[50] Tao Mei,et al. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Wen Gao,et al. Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[52] N. Codella,et al. CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[53] Christian Wolf,et al. Attentional PointNet for 3D-Object Detection in Point Clouds , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[54] Hao Wang,et al. SpSequenceNet: Semantic Segmentation Network on 4D Point Clouds , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[56] Leonid Sigal,et al. Interpretable Spatio-Temporal Attention for Video Action Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[57] Shuicheng Yan,et al. VOLO: Vision Outlooker for Visual Recognition , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[58] Shaogang Gong,et al. Harmonious Attention Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[59] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[60] Xiaogang Wang,et al. Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[61] Yue Gao,et al. PVNet: A Joint Convolutional Network of Point Cloud and Multi-View for 3D Shape Recognition , 2018, ACM Multimedia.
[62] Jun Wang,et al. MLCVNet: Multi-Level Context VoteNet for 3D Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Shuicheng Yan,et al. Graph-Based Global Reasoning Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.
[65] Cuiling Lan,et al. Relation-Aware Global Attention for Person Re-Identification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[66] Yang Li,et al. You Look Twice: GaterNet for Dynamic Filter Selection in CNNs , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Gang Sun,et al. Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks , 2018, NeurIPS.
[68] Hossein Mobahi,et al. Sharpness-Aware Minimization for Efficiently Improving Generalization , 2020, ArXiv.
[69] Zhe Gan,et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[70] Errui Ding,et al. Compact Generalized Non-local Network , 2018, NeurIPS.
[71] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[72] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[73] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[74] Xilin Chen,et al. Object-Contextual Representations for Semantic Segmentation , 2019, ECCV.
[75] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.
[76] Heng Tao Shen,et al. Hierarchical LSTMs with Adaptive Attention for Visual Captioning , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[77] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.
[78] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[79] Matthieu Cord,et al. Going deeper with Image Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[80] Hyo-Eun Kim,et al. SRM: A Style-Based Recalibration Module for Convolutional Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[81] Wenxiu Sun,et al. Decoupled Spatial-Temporal Transformer for Video Inpainting , 2021, ArXiv.
[82] Lu Yuan,et al. Dynamic Convolution: Attention Over Convolution Kernels , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[83] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[84] Koray Kavukcuoglu,et al. Visual Attention , 2020, Computational Models for Cognitive Vision.
[85] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[86] Feng Wang,et al. Survey on the attention based RNN model and its applications in computer vision , 2016, ArXiv.
[87] Yuxin Peng,et al. Object-Part Attention Model for Fine-Grained Image Classification , 2017, IEEE Transactions on Image Processing.
[88] D. Tao,et al. A Survey on Visual Transformer , 2020, ArXiv.
[89] Stephen Lin,et al. Local Relation Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[90] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[91] Cees Snoek,et al. VideoLSTM convolves, attends and flows for action recognition , 2016, Comput. Vis. Image Underst..
[92] Enhua Wu,et al. Transformer in Transformer , 2021, NeurIPS.
[93] Nassir Navab,et al. Recalibrating Fully Convolutional Networks With Spatial and Channel “Squeeze and Excitation” Blocks , 2018, IEEE Transactions on Medical Imaging.
[94] Shu-Tao Xia,et al. Second-Order Attention Network for Single Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[95] Yi Yang,et al. Gated Channel Transformation for Visual Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[96] Yu Cheng,et al. Jointly Attentive Spatial-Temporal Pooling Networks for Video-Based Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[97] Yunchao Wei,et al. CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[98] Weihong Deng,et al. Mixed High-Order Attention Network for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[99] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[100] Wei Liu,et al. ParseNet: Looking Wider to See Better , 2015, ArXiv.
[101] Stephen Lin,et al. Deformable ConvNets V2: More Deformable, Better Results , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[102] Luc Van Gool,et al. Spatio-Temporal Channel Correlation Networks for Action Classification , 2018, ECCV.
[103] Jingdong Wang,et al. OCNet: Object Context Network for Scene Parsing , 2018, ArXiv.
[104] Jifeng Dai,et al. FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[105] Ralph R. Martin,et al. Sampling Equivariant Self-Attention Networks for Object Detection in Aerial Images , 2021, IEEE Transactions on Image Processing.
[106] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[107] Jun Zhu,et al. Query2Label: A Simple Transformer Way to Multi-Label Classification , 2021, ArXiv.
[108] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[109] Ross B. Girshick,et al. Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[110] Kaiming He,et al. Group Normalization , 2018, ECCV.
[111] Qilong Wang,et al. Global Second-Order Pooling Convolutional Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[112] Vladlen Koltun,et al. Exploring Self-Attention for Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[113] Fahad Shahbaz Khan,et al. Transformers in Vision: A Survey , 2021, ACM Comput. Surv..
[114] Shuicheng Yan,et al. End-to-End Comparative Attention Networks for Person Re-Identification , 2016, IEEE Transactions on Image Processing.
[115] Xiang Bai,et al. Asymmetric Non-Local Neural Networks for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[116] Rohan Ramanath,et al. An Attentive Survey of Attention Models , 2019, ACM Trans. Intell. Syst. Technol..
[117] Lars Petersson,et al. Bilinear Attention Networks for Person Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[118] Bowen Zhou,et al. A Structured Self-attentive Sentence Embedding , 2017, ICLR.
[119] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[120] Xiaojie Jin,et al. DeepViT: Towards Deeper Vision Transformer , 2021, ArXiv.
[121] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[122] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[123] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[124] Quoc V. Le,et al. CoAtNet: Marrying Convolution and Attention for All Data Sizes , 2021, NeurIPS.
[125] Yu Qiao,et al. Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos , 2018, IEEE Transactions on Image Processing.
[126] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[127] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[128] Ralph R. Martin,et al. PCT: Point cloud transformer , 2020, Computational Visual Media.
[129] Zhizhong Han,et al. CF-SIS: Semantic-Instance Segmentation of 3D Point Clouds by Context Fusion with Self-Attention , 2020, ACM Multimedia.
[130] Bingbing Ni,et al. Modeling Point Clouds With Self-Attention and Gumbel Subset Sampling , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[131] Zheng Zhang,et al. Disentangled Non-Local Neural Networks , 2020, ECCV.
[132] Stephen Lin,et al. An Empirical Study of Spatial Attention Mechanisms in Deep Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[133] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[134] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.
[135] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[136] Cheng Wang,et al. Mancs: A Multi-task Attentional Network with Curriculum Sampling for Person Re-Identification , 2018, ECCV.
[137] Yi Yang,et al. Pedestrian Alignment Network for Large-scale Person Re-Identification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.
[138] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[139] Yi Yang,et al. Diagnose like a Radiologist: Attention Guided Convolutional Neural Network for Thorax Disease Classification , 2018, ArXiv.
[140] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[141] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[142] Yang Yang,et al. ABD-Net: Attentive but Diverse Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[143] Saining Xie,et al. An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[144] Alexander Kolesnikov,et al. MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.
[145] Stephen Lin,et al. GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[146] Ling Shao,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, ArXiv.
[147] Jiashi Feng,et al. Strip Pooling: Rethinking Spatial Pooling for Scene Parsing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[148] Jinhyung Kim,et al. READ: Reciprocal Attention Discriminator for Image-to-Video Re-identification , 2020, ECCV.
[149] Tao Xiang,et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[150] Quoc V. Le,et al. CondConv: Conditionally Parameterized Convolutions for Efficient Inference , 2019, NeurIPS.
[151] Yun Fu,et al. Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.
[152] Chongruo Wu,et al. ResNeSt: Split-Attention Networks , 2020, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[153] Han Zhang,et al. Co-Occurrent Features in Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[154] Ashish Vaswani,et al. Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.
[155] Dongqing Zhang,et al. Neural Aggregation Network for Video Face Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[156] M. Corbetta,et al. Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.
[157] Yichen Wei,et al. Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[158] Anima Anandkumar,et al. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.
[159] Xiaogang Wang,et al. Video Person Re-identification with Competitive Snippet-Similarity Aggregation and Co-attentive Snippet Embedding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[160] Ruslan Salakhutdinov,et al. Action Recognition using Visual Attention , 2015, NIPS 2015.
[161] Klaus Dietmayer,et al. Point Transformer , 2020, IEEE Access.
[162] Jürgen Schmidhuber,et al. Training Very Deep Networks , 2015, NIPS.
[163] Guodong Guo,et al. Hierarchical Pyramid Diverse Attention Networks for Face Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[164] In-So Kweon,et al. BAM: Bottleneck Attention Module , 2018, BMVC.
[165] Yun Fu,et al. Tell Me Where to Look: Guided Attention Inference Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[166] Thomas Serre,et al. Learning what and where to attend , 2018, ICLR.
[167] Shuicheng Yan,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.
[168] Shuicheng Yan,et al. A2-Nets: Double Attention Networks , 2018, NeurIPS.
[169] Jiebo Luo,et al. Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[170] Wenjun Zeng,et al. An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.
[171] Yongdong Zhang,et al. STAT: Spatial-Temporal Attention Mechanism for Video Captioning , 2020, IEEE Transactions on Multimedia.
[172] Julien Mairal,et al. Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[173] Jing Xu,et al. Attention-Aware Compositional Network for Person Re-identification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[174] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[175] Xiaogang Wang,et al. Multi-context Attention for Human Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[176] Fei Wu,et al. FcaNet: Frequency Channel Attention Networks , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[177] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.