Action Recognition Using Rate-Invariant Analysis of Skeletal Shape Trajectories

We study the problem of classifying actions of human subjects using depth movies generated by Kinect or other depth sensors. Representing human body as dynamical skeletons, we study the evolution of their (skeletons') shapes as trajectories on Kendall's shape manifold. The action data is typically corrupted by large variability in execution rates within and across subjects and, thus, causing major problems in statistical analyses. To address that issue, we adopt a recently-developed framework of Su et al. [1], [2] to this problem domain. Here, the variable execution rates correspond to re-parameterizations of trajectories, and one uses a parameterization-invariant metric for aligning, comparing, averaging, and modeling trajectories. This is based on a combination of transported square-root vector fields (TSRVFs) of trajectories and the standard Euclidean norm, that allows computational efficiency. We develop a comprehensive suite of computational tools for this application domain: smoothing and denoising skeleton trajectories using median filtering, up- and down-sampling actions in time domain, simultaneous temporal-registration of multiple actions, and extracting invertible Euclidean representations of actions. Due to invertibility these Euclidean representations allow both discriminative and generative models for statistical analysis. For instance, they can be used in a SVM-based classification of original actions, as demonstrated here using MSR Action-3D, MSR Daily Activity and 3D Action Pairs datasets. Using only the skeletal information, we achieve state-of-the-art classification results on these datasets.

[1]  Anuj Srivastava,et al.  Shape Analysis of Elastic Curves in Euclidean Spaces , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Antonio Fernández-Caballero,et al.  A survey of video datasets for human action and activity recognition , 2013, Comput. Vis. Image Underst..

[3]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[4]  Ramakant Nevatia,et al.  Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[5]  Rama Chellappa,et al.  Silhouette-based gesture and action recognition via modeling trajectories on Riemannian shape manifolds , 2011, Comput. Vis. Image Underst..

[6]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[7]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[8]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[9]  A. Mehrabian,et al.  Decoding of inconsistent communications. , 1967, Journal of personality and social psychology.

[10]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[11]  Pavan K. Turaga,et al.  On advances in differential-geometric approaches for 2D and 3D shape analyses and activity recognition , 2012, Image Vis. Comput..

[12]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Rama Chellappa,et al.  Rate-Invariant Recognition of Humans and Their Activities , 2009, IEEE Transactions on Image Processing.

[14]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Sudeep Sarkar,et al.  Rate-Invariant Analysis of Trajectories on Riemannian Manifolds with Application in Visual Speech Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[18]  Junsong Yuan,et al.  Learning Actionlet Ensemble for 3D Human Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[20]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[21]  Leonid Sigal,et al.  Poselet Key-Framing: A Model for Human Activity Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Rama Chellappa,et al.  Statistical Computations on Grassmann and Stiefel Manifolds for Image and Video-Based Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[24]  Meinard Müller,et al.  Motion templates for automatic classification and retrieval of motion capture data , 2006, SCA '06.

[25]  Anuj Srivastava,et al.  Statistical Shape Analysis , 2014, Computer Vision, A Reference Guide.

[26]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[27]  Xiaodong Yang,et al.  Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[28]  Hazem Wannous,et al.  Grassmannian Representation of Motion Depth for 3D Human Gesture and Action Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[29]  Alberto Del Bimbo,et al.  Space-Time Pose Representation for 3D Human Action Recognition , 2013, ICIAP Workshops.

[30]  Andrea Kleinsmith,et al.  Affective Body Expression Perception and Recognition: A Survey , 2013, IEEE Transactions on Affective Computing.

[31]  Hema Swetha Koppula,et al.  Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation , 2013, ICML.

[32]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Anuj Srivastava,et al.  Statistical analysis of trajectories on Riemannian manifolds: Bird migration, hurricane tracking and video surveillance , 2014, 1405.0803.

[34]  Jenq-Neng Hwang,et al.  A Review on Video-Based Human Activity Recognition , 2013, Comput..

[35]  Yaser Sheikh,et al.  Matching Trajectories of Anatomical Landmarks Under Viewpoint, Anthropometric and Temporal Transforms , 2009, International Journal of Computer Vision.

[36]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).