Actor-Transformers for Group Activity Recognition
暂无分享,去创建一个
Cees G. M. Snoek | Ryan Sanford | Kirill Gavrilyuk | Mehrsan Javan | Kirill Gavrilyuk | Mehrsan Javan | Ryan Sanford
[1] Baoxin Li,et al. Multi-stream CNN: Learning representations based on human-related regions for action recognition , 2018, Pattern Recognit..
[2] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[3] Gang Wang,et al. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Silvio Savarese,et al. A Unified Framework for Multi-target Tracking and Collective Activity Recognition , 2012, ECCV.
[5] Greg Mori,et al. Hierarchical Relational Networks for Group Activity Recognition and Retrieval , 2018, ECCV.
[6] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[7] Silvio Savarese,et al. Learning context for collective activity recognition , 2011, CVPR 2011.
[8] Yong Du,et al. Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[10] Li Wang,et al. Learning Actor Relation Graphs for Group Activity Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Pichao Wang,et al. Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks , 2018, IEEE Transactions on Circuits and Systems for Video Technology.
[12] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[13] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[15] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[16] Cees Snoek,et al. VideoLSTM convolves, attends and flows for action recognition , 2016, Comput. Vis. Image Underst..
[17] Yang Wang,et al. Discriminative Latent Models for Recognizing Contextual Group Activities , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[18] Bingbing Ni,et al. Recurrent Modeling of Interaction Context for Collective Activity Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Dustin Tran,et al. Image Transformer , 2018, ICML.
[20] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[22] Deva Ramanan,et al. Attentional Pooling for Action Recognition , 2017, NIPS.
[23] Greg Mori,et al. Structure Inference Machines: Recurrent Neural Networks for Analyzing Relations in Group Activity Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[25] Song-Chun Zhu,et al. Joint action recognition and pose estimation from video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[27] Abhinav Gupta,et al. Videos as Space-Time Region Graphs , 2018, ECCV.
[28] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[29] Andrew Zisserman,et al. Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Luc Van Gool,et al. stagNet: An Attentive Semantic RNN for Group Activity Recognition , 2018, ECCV.
[31] Wang Yan,et al. Visual recognition by counting instances: A multi-instance cardinality potential kernel , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Thomas Brox,et al. Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[33] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Dong Liu,et al. Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Gang Wang,et al. Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.
[36] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[37] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[38] Silvio Savarese,et al. Social Scene Understanding: End-to-End Multi-person Action Localization and Collective Activity Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Alan L. Yuille,et al. An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[40] Silvio Savarese,et al. Understanding Collective Activitiesof People from Videos , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[41] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[42] Greg Mori,et al. A Hierarchical Deep Temporal Model for Group Activity Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Ahmad Nickabadi,et al. Convolutional Relational Machine for Group Activity Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Yifan Zhang,et al. Action Recognition with Joints-Pooled 3D Deep Convolutional Descriptors , 2016, IJCAI.
[45] Song-Chun Zhu,et al. CERN: Confidence-Energy Recurrent Network for Group Activity Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Christian Wolf,et al. Human Activity Recognition with Pose-driven Attention to RGB , 2018, BMVC.
[47] Xiao Liu,et al. Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[48] Xin Li,et al. SBGAR: Semantics Based Group Activity Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[49] Arnold W. M. Smeulders,et al. Timeception for Complex Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Cordelia Schmid,et al. P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[51] Greg Mori,et al. Social roles in hierarchical models for human activity recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[52] Cordelia Schmid,et al. PoTion: Pose MoTion Representation for Action Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[53] Ruslan Salakhutdinov,et al. Action Recognition using Visual Attention , 2015, NIPS 2015.
[54] Cordelia Schmid,et al. Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[55] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[56] Yu Qiao,et al. RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[57] Mohamed R. Amer,et al. HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos , 2014, ECCV.
[58] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[59] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.
[60] Jascha Sohl-Dickstein,et al. Capacity and Trainability in Recurrent Neural Networks , 2016, ICLR.
[61] Silvio Savarese,et al. What are they doing? : Collective activity classification using spatio-temporal relationship among people , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.
[62] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Wenjun Zeng,et al. An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.