Building Unified Human Descriptors For Multi-Type Activity Recognition

Activity recognition is an important as well as a difficult task in computer vision. In the past years many types of activities -- single actions, two persons interactions or ego-centric activities to name a few -- have been analyzed. Nevertheless, researchers have always treated such types of activities separately. In this paper, we propose a new problem: labeling a complex scene where activities of different types happen in sequence or concurrently. We first present a new unified descriptor, called Relation History Image (RHI), which can be extracted from all the activity types we are interested in. We then propose a new method to recognize the activities and at the same time associate them to the humans who are performing them. Next, we evaluate our approach on a newly recorded dataset which is representative of the problem we are considering. Finally, we show the efficacy of the RHI descriptor on publicly available datasets performing extensive evaluations.

[1]  Alan L. Yuille,et al.  An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[3]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Ivan Laptev,et al.  Efficient Feature Extraction, Encoding, and Classification for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jake K. Aggarwal,et al.  Stochastic Representation and Recognition of High-Level Group Activities , 2011, International Journal of Computer Vision.

[6]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Jake K. Aggarwal,et al.  Robot-centric Activity Recognition from First-Person RGB-D Videos , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[8]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  James M. Rehg,et al.  Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[11]  Ling Shao,et al.  Leveraging Hierarchical Parametric Networks for Skeletal Joints Based Action Segmentation and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[13]  Hairong Qi,et al.  Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Larry S. Davis,et al.  Combining Per-frame and Per-track Cues for Multi-person Action Recognition , 2012, ECCV.

[15]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[16]  Alexandros André Chaaraoui,et al.  Fusion of Skeletal and Silhouette-Based Features for Human Action Recognition with RGB-D Devices , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[17]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[18]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[19]  Georgios Evangelidis,et al.  Skeletal Quads: Human Action Recognition Using Joint Quadruples , 2014, 2014 22nd International Conference on Pattern Recognition.

[20]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[21]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[22]  Hong Cheng,et al.  Interactive body part contrast mining for human interaction recognition , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[23]  Rama Chellappa,et al.  Recognizing Interactive Group Activities Using Temporal Interaction Matrices and Their Riemannian Statistics , 2012, International Journal of Computer Vision.

[24]  Takeo Kanade,et al.  First-Person Vision , 2012, Proceedings of the IEEE.

[25]  Yunde Jia,et al.  Interactive Phrases: Semantic Descriptionsfor Human Interaction Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Yang Wang,et al.  Beyond Actions: Discriminative Models for Contextual Group Activities , 2010, NIPS.

[27]  Larry H. Matthies,et al.  First-Person Activity Recognition: What Are They Doing to Me? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[29]  Leonid Sigal,et al.  Poselet Key-Framing: A Model for Human Activity Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Ian D. Reid,et al.  High Five: Recognising human interactions in TV shows , 2010, BMVC.

[33]  Cewu Lu,et al.  Range-Sample Depth Feature for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Greg Mori,et al.  Social roles in hierarchical models for human activity recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.