3D human action recognition based on the Spatial-Temporal Moving Skeleton Descriptor

With the popularization of the Kinect sensor, human actions can be recognized based on the 3D skeletal information. In this paper, the Spatial-Temporal Moving Skeleton Descriptor (STMSD) is proposed by the fusion of three complementary features which are the Relative Geometric Velocity (RGV) between body parts, Relative Joint Positions (RJP), and Joint Angles (JA). The STMSD descriptor gives a complete view of the body skeleton in space and time. Among the three features, the Relative Geometric Velocity (RGV) is first proposed in our work. Inspired by the relative geometry using the Lie group and the Lie algebra, RGV describes the variation rates of body transformations which include 3D rotations and translations. Then interpolation and normalization are applied in frame descriptors. After the temporal modeling, Principal Component Analysis (PCA) is utilized. Experimental results on three datasets show that our approach performs better than existing action recognition approaches, including skeleton-based and other types.

[1]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[2]  Bian Ziyang,et al.  Human Abnormal Behavior Detection Based on RGBD Video’s Skeleton Information Entropy , 2016 .

[3]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[4]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[5]  Alberto Del Bimbo,et al.  Recognizing Actions from Depth Cameras as Weakly Aligned Multi-part Bag-of-Poses , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[7]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Rama Chellappa,et al.  R3DG features: Relative 3D geometry-based skeletal representations for human action recognition , 2016, Comput. Vis. Image Underst..

[10]  Rushil Anirudh,et al.  Elastic functional coding of human actions: From vector-fields to latent variables , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[12]  Guodong Guo,et al.  Fusing Spatiotemporal Features and Joints for 3D Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Ying Wu,et al.  Learning Maximum Margin Temporal Warping for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.