C AN AN I MAGE C LASSIFIER S UFFICE FOR A CTION R ECOGNITION ?
暂无分享,去创建一个
[1] Chong-Wah Ngo,et al. Condensing a Sequence to One Informative Frame for Video Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[2] Kate Saenko,et al. Dynamic Network Quantization for Efficient Video Inference , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[3] Christoph Feichtenhofer,et al. Multiscale Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[4] Cordelia Schmid,et al. ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[5] Lu Yuan,et al. Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[6] A. Oliva,et al. AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition , 2021, ICLR.
[7] Heng Wang,et al. Is Space-Time Attention All You Need for Video Understanding? , 2021, ICML.
[8] Omri Bar,et al. Video Transformer Network , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).
[9] Parham Aarabi,et al. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[11] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[12] Quanfu Fan,et al. Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Alexandros Stergiou,et al. Learn to cycle: Time-consistent feature discovery for action recognition , 2020, Pattern Recognit. Lett..
[14] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[15] Chunhua Shen,et al. Twins: Revisiting Spatial Attention Design in Vision Transformers , 2021, ArXiv.
[16] Can Zhang,et al. PAN: Towards Fast Action Recognition via Learning Persistence of Appearance , 2020, ArXiv.
[17] Kate Saenko,et al. AR-Net: Adaptive Frame Resolution for Efficient Action Recognition , 2020, ECCV.
[18] Suha Kwak,et al. MotionSqueeze: Neural Motion Feature Learning for Video Understanding , 2020, ECCV.
[19] Hazel Doughty,et al. Rescaling Egocentric Vision , 2020, ArXiv.
[20] Christoph Feichtenhofer,et al. X3D: Expanding Architectures for Efficient Video Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Bolei Zhou,et al. Temporal Pyramid Network for Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Harish G. Ramaswamy,et al. Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[23] Feiyue Huang,et al. TEINet: Towards an Efficient Architecture for Video Recognition , 2019, AAAI.
[24] Michael S. Ryoo,et al. AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures , 2019, ICLR.
[25] Bolei Zhou,et al. Moments in Time Dataset: One Million Videos for Event Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[26] Larry S. Davis,et al. A Coarse-to-Fine Framework for Resource Efficient Video Recognition , 2019, International Journal of Computer Vision.
[27] Joanna Materzynska,et al. The Jester Dataset: A Large-Scale Video Dataset of Human Gestures , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[28] H. R. Tavakoli,et al. AWSD: Adaptive Weighted Spatiotemporal Distillation for Video Representation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[29] Wei Wu,et al. STM: SpatioTemporal and Motion Encoding for Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[30] Abdenour Hadid,et al. AVD: Adversarial Video Distillation , 2019, ArXiv.
[31] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[32] Seong Joon Oh,et al. CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[33] Chao Li,et al. Collaborative Spatiotemporal Feature Learning for Video Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[35] Larry S. Davis,et al. AdaFrame: Adaptive Frame Selection for Fast Video Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[37] Hassan Foroosh,et al. Still Image Action Recognition by Predicting Spatial-Temporal Pixel Evolution , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).
[38] Yi Li,et al. RESOUND: Towards Action Recognition Without Representation Bias , 2018, ECCV.
[39] Chen Sun,et al. Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification , 2017, ECCV.
[40] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[41] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[42] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.
[43] Song Han,et al. Temporal Shift Module for Efficient Video Understanding , 2018, ArXiv.
[44] Zhiqiang Shen,et al. Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[45] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[46] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[47] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[48] Huimin Ma,et al. Single Image Action Recognition Using Semantic Body Part Actions , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[49] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[50] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[51] Andrea Vedaldi,et al. Dynamic Image Networks for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[55] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[56] James W. Davis,et al. The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.