JRDB-Act: A Large-scale Dataset for Spatio-temporal Action, Social Group and Activity Detection
暂无分享,去创建一个
Silvio Savarese | Ian Reid | Fatemeh Saleh | Mahsa Ehsanpour | Hamid Rezatofighi | I. Reid | S. Savarese | F. Saleh | Hamid Rezatofighi | Mahsa Ehsanpour
[1] Luc Van Gool,et al. Large Scale Holistic Video Understanding , 2019, ECCV.
[2] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[3] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[4] Fei Wang,et al. Eigendecomposition-free Training of Deep Networks with Zero Eigenvalue-based Losses , 2018, ECCV.
[5] Silvio Savarese,et al. Understanding Collective Activitiesof People from Videos , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[6] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.
[7] Pietro Perona,et al. Self-Tuning Spectral Clustering , 2004, NIPS.
[8] Li Fei-Fei,et al. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos , 2015, International Journal of Computer Vision.
[9] Andrew Zisserman,et al. Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[11] Dima Damen,et al. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset , 2018, ArXiv.
[12] Greg Mori,et al. A Hierarchical Deep Temporal Model for Group Activity Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Haroon Idrees,et al. The THUMOS challenge on action recognition for videos "in the wild" , 2016, Comput. Vis. Image Underst..
[14] Bernt Schiele,et al. A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[15] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Ian Reid,et al. Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos , 2020, ECCV.
[17] Martial Hebert,et al. Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.
[18] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.
[19] Mubarak Shah,et al. Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[20] B. Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..
[21] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[23] Andrew Zisserman,et al. The AVA-Kinetics Localized Human Actions Video Dataset , 2020, ArXiv.
[24] Ali Farhadi,et al. Asynchronous Temporal Fields for Action Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Ying Wu,et al. Discriminative subvolume search for efficient action detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[26] Yihong Gong,et al. Know Your Surroundings: Panoramic Multi-Object Tracking by Multimodality Collaboration , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[27] Silvio Savarese,et al. JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[28] Silvio Savarese,et al. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Yansong Tang,et al. COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Andrew Zisserman,et al. A Better Baseline for AVA , 2018, ArXiv.
[31] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[32] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.
[33] Silvio Savarese,et al. What are they doing? : Collective activity classification using spatio-temporal relationship among people , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.
[34] Cordelia Schmid,et al. Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[35] C. Schmid,et al. Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[36] Cordelia Schmid,et al. AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[37] Shuai Yi,et al. GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[38] Silvio Savarese,et al. A Unified Framework for Multi-target Tracking and Collective Activity Recognition , 2012, ECCV.
[39] Kate Saenko,et al. R-C3D: Region Convolutional 3D Network for Temporal Activity Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[40] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[41] Heng Wang,et al. Scenes-Objects-Actions: A Multi-task, Multi-label Video Dataset , 2018, ECCV.
[42] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Hang Zhao,et al. HACS: Human Action Clips and Segments Dataset for Recognition and Temporal Localization , 2017, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[44] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[45] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[46] Siwei Lyu,et al. Who did What at Where and When: Simultaneous Multi-Person Tracking and Activity Recognition , 2018, ArXiv.
[47] Tao Mei,et al. Recurrent Tubelet Proposal and Recognition Networks for Action Detection , 2018, ECCV.
[48] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[49] Xin Yu,et al. The IKEA ASM Dataset: Understanding People Assembling Furniture through Actions, Objects and Pose , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[50] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.
[51] Kaiming He,et al. Long-Term Feature Banks for Detailed Video Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Silvio Savarese,et al. Learning context for collective activity recognition , 2011, CVPR 2011.
[53] Ali Farhadi,et al. Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.
[54] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.
[55] Sheng Tang,et al. Overcoming Classifier Imbalance for Long-Tail Object Detection With Balanced Group Softmax , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Ronen Basri,et al. Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.
[57] Cordelia Schmid,et al. Actor-Centric Relation Network , 2018, ECCV.
[58] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Silvio Savarese,et al. JRDB: A Dataset and Benchmark of Egocentric Robot Visual Perception of Humans in Built Environments , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.