Temporal Activity Segmentation for Depth Cameras Using Joint Angle Variance Features

In this work, we look to address the problem of temporal segmentation of human activity for videos captured with depth cameras. We propose a novel feature definition (namely, joint angle variance (JAV)) which works with skeleton joint data from depth videos. We define the angle between nearby joints as joint angles and compute the variance of each of these joint angles over a set of frames. The variance values for all the joint angles are concatenated to form the feature vector for that particular window of frames. We then use these feature vectors to train a support vector machine (SVM). Given a new test sequence, we can classify each frame belonging to a particular class by computing the class probabilities of the JAV feature computed from its corresponding window. We demonstrate the efficacy of our approach on a standard dataset consisting of ten different actions. Our work highlights that a rather simplistic and efficient formulation can also yield a good temporal segmentation performance on RGBD data.

[1]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Nasser Kehtarnavaz,et al.  Real-time human action recognition based on depth motion maps , 2016, Journal of Real-Time Image Processing.

[3]  Amr Sharaf,et al.  Real-Time Multi-scale Action Detection from 3D Skeleton Data , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[4]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Martial Hebert,et al.  Temporal segmentation and activity classification from first-person sensing , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[7]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[8]  Ennio Gambi,et al.  A Human Activity Recognition System Using Skeleton Data from RGBD Sensors , 2016, Comput. Intell. Neurosci..

[9]  Stan Sclaroff,et al.  Learning Activity Progression in LSTMs for Activity Detection and Early Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Luís Nunes,et al.  Automatic Human Activity Segmentation and Labeling in RGBD Videos , 2016, KES-IDT.

[11]  Shmuel Peleg,et al.  Temporal Segmentation of Egocentric Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Ruzena Bajcsy,et al.  Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, CVPR Workshops.

[13]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[14]  Dimitrios Makris,et al.  Dynamic Feature Selection for Online Action Recognition , 2013, HBU.

[15]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Alberto Del Bimbo,et al.  Motion segment decomposition of RGB-D sequences for human behavior understanding , 2017, Pattern Recognit..

[17]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Juan Carlos Niebles,et al.  A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).