SAST: Learning Semantic Action-Aware Spatial-Temporal Features for Efficient Action Recognition
Fei Wang | Hao Chu | Guorui Wang | Yunwen Huang | Hao Chu | Guorui Wang | Fei Wang | Yunwen Huang
[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[2] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[3] Ruslan Salakhutdinov,et al. Action Recognition using Visual Attention , 2015, NIPS 2015.
[4] Dahua Lin,et al. Trajectory Convolution for Action Recognition , 2018, NeurIPS.
[5] Xiaogang Wang,et al. Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Lorenzo Torresani,et al. SCSampler: Sampling Salient Clips From Video for Efficient Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[7] M. Corbetta,et al. Control of goal-directed and stimulus-driven attention in the brain , 2002, Nature Reviews Neuroscience.
[8] Muhammad Haroon Yousaf,et al. DA-VLAD: Discriminative Action Vector of Locally Aggregated Descriptors for Action Recognition , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).
[9] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[10] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[11] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[12] Yutaka Satoh,et al. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[13] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[14] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[15] Abhinav Gupta,et al. Videos as Space-Time Region Graphs , 2018, ECCV.
[16] Yi Li,et al. Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[17] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Luc Van Gool,et al. DynamoNet: Dynamic Action and Motion Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[19] Luc Van Gool,et al. Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification , 2017, ArXiv.
[20] Shuicheng Yan,et al. Multi-Fiber Networks for Video Recognition , 2018, ECCV.
[21] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[22] Cordelia Schmid,et al. PoTion: Pose MoTion Representation for Action Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[23] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[24] Thomas Brox,et al. ECO: Efficient Convolutional Network for Online Video Understanding , 2018, ECCV.
[25] Koray Kavukcuoglu,et al. Visual Attention , 2020, Computational Models for Cognitive Vision.
[26] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[27] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Chen Sun,et al. Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification , 2017, ECCV.
[29] Lorenzo Torresani,et al. DistInit: Learning Video Representations Without a Single Labeled Video , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[30] Heng Wang,et al. Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Yutaka Satoh,et al. Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).
[32] Limin Wang,et al. Appearance-and-Relation Networks for Video Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[33] Gaurav Sharma,et al. AdaScan: Adaptive Scan Pooling in Deep Convolutional Neural Networks for Human Action Recognition in Videos , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[35] Mubarak Shah,et al. A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.
[36] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[37] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[38] Kate Saenko,et al. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.
[39] Alexander G. Hauptmann,et al. MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .
[40] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.
[41] Heng Wang,et al. Video Classification With Channel-Separated Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[42] Lin Li,et al. End-to-end Video-level Representation Learning for Action Recognition , 2017, 2018 24th International Conference on Pattern Recognition (ICPR).
[43] Cordelia Schmid,et al. Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.
[44] Wei Wu,et al. STM: SpatioTemporal and Motion Encoding for Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[45] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.
[46] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[47] Yali Wang,et al. PA3D: Pose-Action 3D Machine for Video Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Geoffrey E. Hinton,et al. Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.
[49] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[50] Cordelia Schmid,et al. Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[51] Shih-Fu Chang,et al. ConvNet Architecture Search for Spatiotemporal Feature Learning , 2017, ArXiv.
[52] Weiyao Lin,et al. Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion , 2018, AAAI.