Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition
暂无分享,去创建一个
Suha Kwak | Minsu Cho | Heeseung Kwon | Manjin Kim | Suha Kwak | Minsu Cho | Heeseung Kwon | Manjin Kim
[1] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[2] Jan Kautz,et al. PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[3] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[4] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[5] Ashish Vaswani,et al. Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.
[6] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Hailin Jin,et al. Learning Video Representations From Correspondence Proposals , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Shanmuganathan Raman,et al. Attentive Spatio-Temporal Representation Learning for Diving Classification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[9] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[10] Yue Zhao,et al. FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Bolei Zhou,et al. Temporal Pyramid Network for Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Wei Wu,et al. STM: SpatioTemporal and Motion Encoding for Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Abhinav Gupta,et al. Videos as Space-Time Region Graphs , 2018, ECCV.
[14] Stephen Lin,et al. Local Relation Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[15] Michael S. Ryoo,et al. Representation Flow for Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Yi Li,et al. RESOUND: Towards Action Recognition Without Representation Bias , 2018, ECCV.
[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[18] Juan Carlos Niebles,et al. RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition , 2020, ECCV.
[19] Vladlen Koltun,et al. Exploring Self-Attention for Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Pieter Abbeel,et al. Bottleneck Transformers for Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Alan Yuille,et al. Grouped Spatial-Temporal Aggregation for Efficient Action Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[22] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] Yali Wang,et al. SmallBigNet: Integrating Core and Contextual Views for Video Classification , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[25] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Chuang Gan,et al. End-to-End Learning of Motion Representation for Video Understanding , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[27] Deva Ramanan,et al. Volumetric Correspondence Networks for Optical Flow , 2019, NeurIPS.
[28] Minsu Cho,et al. First Person Action Recognition via Two-stream ConvNet with Long-term Fusion Pooling , 2018, Pattern Recognit. Lett..
[29] Minsu Cho,et al. Relational Embedding for Few-Shot Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[30] Thomas Brox,et al. ECO: Efficient Convolutional Network for Online Video Understanding , 2018, ECCV.
[31] Sergio Escalera,et al. Gate-Shift Networks for Video Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[33] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Nojun Kwak,et al. Motion Feature Network: Fixed Motion Filter for Action Recognition , 2018, ECCV.
[35] Horst Bischof,et al. A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.
[36] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[37] Xianglong Liu,et al. Spatio-temporal deformable 3D ConvNets with attention for action recognition , 2020, Pattern Recognit..
[38] Gedas Bertasius,et al. Is Space-Time Attention All You Need for Video Understanding? , 2021, ICML.
[39] Seungryong Kim,et al. FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Patrick Pérez,et al. Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.
[41] Patrick Pérez,et al. View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[42] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[43] Wei Zhang,et al. Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[44] Heng Wang,et al. Video Classification With Channel-Separated Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[45] Guillaume-Alexandre Bilodeau,et al. Local self-similarity-based registration of human ROIs in pairs of stereo thermal-visible videos , 2013, Pattern Recognit..
[46] Suha Kwak,et al. MotionSqueeze: Neural Motion Feature Learning for Video Understanding , 2020, ECCV.
[47] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[48] Deva Ramanan,et al. Attentional Pooling for Action Recognition , 2017, NIPS.
[49] Heng Wang,et al. Video Modeling With Correlation Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[51] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[52] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[53] Dahua Lin,et al. Trajectory Convolution for Action Recognition , 2018, NeurIPS.
[54] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.
[55] Bin Kang,et al. TEA: Temporal Excitation and Aggregation for Action Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Feiyue Huang,et al. TEINet: Towards an Efficient Architecture for Video Recognition , 2019, AAAI.
[57] Minsu Cho,et al. IntegralAction: Pose-driven Feature Integration for Robust Human Action Recognition in Videos , 2020, ArXiv.
[58] Trevor Darrell,et al. Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Christoph Feichtenhofer,et al. X3D: Expanding Architectures for Efficient Video Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[61] Chuang Gan,et al. TSM: Temporal Shift Module for Efficient Video Understanding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[62] Mingrui Wu,et al. Gradient descent optimization of smoothed information retrieval metrics , 2010, Information Retrieval.
[63] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).