论文信息 - Building Unified Human Descriptors For Multi-Type Activity Recognition

Building Unified Human Descriptors For Multi-Type Activity Recognition

Activity recognition is an important as well as a difficult task in computer vision. In the past years many types of activities -- single actions, two persons interactions or ego-centric activities to name a few -- have been analyzed. Nevertheless, researchers have always treated such types of activities separately. In this paper, we propose a new problem: labeling a complex scene where activities of different types happen in sequence or concurrently. We first present a new unified descriptor, called Relation History Image (RHI), which can be extracted from all the activity types we are interested in. We then propose a new method to recognize the activities and at the same time associate them to the humans who are performing them. Next, we evaluate our approach on a newly recorded dataset which is representative of the problem we are considering. Finally, we show the efficacy of the RHI descriptor on publicly available datasets performing extensive evaluations.

Jake K. Aggarwal | Ilaria Gori | Michael S. Ryoo

[1] Alan L. Yuille,et al. An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3] Zicheng Liu,et al. HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Ivan Laptev,et al. Efficient Feature Extraction, Encoding, and Classification for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Jake K. Aggarwal,et al. Stochastic Representation and Recognition of High-Level Group Activities , 2011, International Journal of Computer Vision.

[6] Jake K. Aggarwal,et al. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7] Jake K. Aggarwal,et al. Robot-centric Activity Recognition from First-Person RGB-D Videos , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[8] Ying Wu,et al. Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9] James M. Rehg,et al. Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Patrick Pérez,et al. Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[11] Ling Shao,et al. Leveraging Hierarchical Parametric Networks for Skeletal Joints Based Action Segmentation and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Vincent Lepetit,et al. BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[13] Hairong Qi,et al. Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps , 2013, 2013 IEEE International Conference on Computer Vision.

[14] Larry S. Davis,et al. Combining Per-frame and Per-track Cues for Multi-person Action Recognition , 2012, ECCV.

[15] Dimitris Samaras,et al. Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[16] Alexandros André Chaaraoui,et al. Fusion of Skeletal and Silhouette-Based Features for Human Action Recognition with RGB-D Devices , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[17] Ying Wu,et al. Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[18] Jake K. Aggarwal,et al. View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[19] Georgios Evangelidis,et al. Skeletal Quads: Human Action Recognition Using Joint Quadruples , 2014, 2014 22nd International Conference on Pattern Recognition.

[20] Ali Farhadi,et al. Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[21] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.

[22] Hong Cheng,et al. Interactive body part contrast mining for human interaction recognition , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[23] Rama Chellappa,et al. Recognizing Interactive Group Activities Using Temporal Interaction Matrices and Their Riemannian Statistics , 2012, International Journal of Computer Vision.

[24] Takeo Kanade,et al. First-Person Vision , 2012, Proceedings of the IEEE.

[25] Yunde Jia,et al. Interactive Phrases: Semantic Descriptionsfor Human Interaction Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Yang Wang,et al. Beyond Actions: Discriminative Models for Contextual Group Activities , 2010, NIPS.

[27] Larry H. Matthies,et al. First-Person Activity Recognition: What Are They Doing to Me? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Wanqing Li,et al. Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[29] Leonid Sigal,et al. Poselet Key-Framing: A Model for Human Activity Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Jake K. Aggarwal,et al. Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31] James W. Davis,et al. The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[32] Ian D. Reid,et al. High Five: Recognising human interactions in TV shows , 2010, BMVC.

[33] Cewu Lu,et al. Range-Sample Depth Feature for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Greg Mori,et al. Social roles in hierarchical models for human activity recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Kristen Grauman,et al. Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36] Rama Chellappa,et al. Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.