暂无分享,去创建一个
Jinhui Tang | Lingxi Xie | Xiangbo Shu | Rui Yan | Lingxi Xie | Jinhui Tang | Xiangbo Shu | Rui Yan
[1] J. Andrew Bagnell,et al. Approximate MaxEnt Inverse Optimal Control and Its Application for Mental Simulation of Human Interactions , 2015, AAAI.
[2] Martial Hebert,et al. Activity Forecasting , 2012, ECCV.
[3] Cordelia Schmid,et al. Actor-Centric Relation Network , 2018, ECCV.
[4] Marc'Aurelio Ranzato,et al. Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.
[5] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Yue Zhao,et al. Intra- and Inter-Action Understanding via Temporal Action Parsing , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Yue Zhao,et al. FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Nitish Srivastava,et al. Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.
[9] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[10] Asim Kadav,et al. Attend and Interact: Higher-Order Object Interactions for Video Understanding , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[11] Abhinav Gupta,et al. Interpretable Intuitive Physics Model , 2018, ECCV.
[12] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[13] Juan Carlos Niebles,et al. Action Genome: Actions As Compositions of Spatio-Temporal Scene Graphs , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Deva Ramanan,et al. CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning , 2020, ICLR.
[15] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[16] Yann LeCun,et al. Deep multi-scale video prediction beyond mean square error , 2015, ICLR.
[17] Andrew Zisserman,et al. Video Representation Learning by Dense Predictive Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[18] Trevor Darrell,et al. Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Jiajun Wu,et al. Physics 101: Learning Physical Object Properties from Unlabeled Videos , 2016, BMVC.
[20] Christian Wolf,et al. Object Level Visual Reasoning in Videos , 2018, ECCV.
[21] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[22] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[23] Jinhui Tang,et al. Social Adaptive Module for Weakly-supervised Group Activity Recognition , 2020, ECCV.
[24] Juan Carlos Niebles,et al. Learning to Decompose and Disentangle Representations for Video Prediction , 2018, NeurIPS.
[25] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[26] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[27] Vighnesh Birodkar,et al. Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.
[28] Chuang Gan,et al. TSM: Temporal Shift Module for Efficient Video Understanding , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[29] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[30] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[31] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[32] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[33] Abhinav Gupta,et al. Videos as Space-Time Region Graphs , 2018, ECCV.
[34] Jinhui Tang,et al. HiGCIN: Hierarchical Graph-Based Cross Inference Network for Group Activity Recognition , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[35] Yi Ma,et al. Learning Long-term Visual Dynamics with Region Proposal Interaction Networks , 2020, ICLR.
[36] Alexei A. Efros,et al. Unbiased look at dataset bias , 2011, CVPR 2011.
[37] Honglak Lee,et al. Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.
[38] Andrew Zisserman,et al. Memory-augmented Dense Predictive Coding for Video Representation Learning , 2020, ECCV.