暂无分享,去创建一个
Yali Wang | Luoqi Liu | Mike Zheng Shou | Kunchang Li | Yu Qiao | Yunpeng Chen | David Junhao Zhang | Shashwat Chandra | Yunpeng Chen | Yali Wang | Luoqi Liu | Y. Qiao | Kunchang Li | Shashwat Chandra
[1] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[3] Quanfu Fan,et al. More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation , 2019, NeurIPS.
[4] Anima Anandkumar,et al. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.
[5] Kaiming He,et al. Panoptic Feature Pyramid Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Youngsaeng Jin,et al. TrSeg: Transformer for semantic segmentation , 2021, Pattern Recognit. Lett..
[7] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[8] Ding Liang,et al. CycleMLP: A MLP-like Architecture for Dense Prediction , 2021, ArXiv.
[9] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[10] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[11] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[12] Wenjun Zeng,et al. Sparse MLP for Image Recognition: Is Self-Attention Really Necessary? , 2021, ArXiv.
[13] Yunchao Wei,et al. CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[14] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[15] Heng Wang,et al. Video Modeling With Correlation Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Heng Wang,et al. Video Classification With Channel-Separated Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[17] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[18] Cordelia Schmid,et al. ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[19] Ivan Marsic,et al. VidTr: Video Transformer Without Convolutions , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[20] N. Codella,et al. CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[21] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[22] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.
[23] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[24] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[25] Andrea Vedaldi,et al. Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers , 2021, NeurIPS.
[26] Bolei Zhou,et al. Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.
[27] C. Qian,et al. TAM: Temporal Adaptive Module for Video Recognition , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[28] Xiang Bai,et al. Asymmetric Non-Local Neural Networks for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[29] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[30] Alexander Kolesnikov,et al. MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.
[31] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.
[32] Ling Shao,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, ArXiv.
[33] Bin Kang,et al. TEA: Temporal Excitation and Aggregation for Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Abhinav Gupta,et al. Videos as Space-Time Region Graphs , 2018, ECCV.
[35] Feiyue Huang,et al. TEINet: Towards an Efficient Architecture for Video Recognition , 2019, AAAI.
[36] Yali Wang,et al. SmallBigNet: Integrating Core and Contextual Views for Video Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[39] Wei Wu,et al. STM: SpatioTemporal and Motion Encoding for Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[40] Christoph Feichtenhofer,et al. X3D: Expanding Architectures for Efficient Video Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Shuicheng Yan,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.
[42] Limin Wang,et al. TDN: Temporal Difference Networks for Efficient Action Recognition , 2020, ArXiv.
[43] Roozbeh Mottaghi,et al. Container: Context Aggregation Network , 2021, NeurIPS.
[44] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[46] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[47] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[48] Yunfeng Cai,et al. S2-MLP: Spatial-Shift MLP Architecture for Vision , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
[49] Jiwen Lu,et al. Global Filter Networks for Image Classification , 2021, NeurIPS.
[50] Lu Yuan,et al. Focal Self-attention for Local-Global Interactions in Vision Transformers , 2021, ArXiv.
[51] Omri Bar,et al. Video Transformer Network , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).
[52] Nenghai Yu,et al. CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Kaiming He,et al. Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[55] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[56] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Alan Yuille,et al. Grouped Spatial-Temporal Aggregation for Efficient Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[58] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[59] Gedas Bertasius,et al. Is Space-Time Attention All You Need for Video Understanding? , 2021, ICML.
[60] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[61] Georgios Tzimiropoulos,et al. Space-time Mixing Attention for Video Transformer , 2021, NeurIPS.
[62] Shuicheng Yan,et al. Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[63] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[64] Christoph Feichtenhofer,et al. Multiscale Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[65] Chuang Gan,et al. TSM: Temporal Shift Module for Efficient Video Understanding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[66] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[67] Quoc V. Le,et al. CoAtNet: Marrying Convolution and Attention for All Data Sizes , 2021, NeurIPS.
[68] Kurt Keutzer,et al. Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.
[69] Matthieu Cord,et al. ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[70] Kai Han,et al. Hire-MLP: Vision MLP via Hierarchical Rearrangement , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Pieter Abbeel,et al. Bottleneck Transformers for Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[72] Shuicheng Yan,et al. A2-Nets: Double Attention Networks , 2018, NeurIPS.
[73] Yu Qiao,et al. CT-Net: Channel Tensorization Network for Video Classification , 2021, ICLR.