Novel Skeleton-based Action Recognition Using Covariance Descriptors on Most Informative Joints

Human action recognition has attracted much attention of research community in recent years due to its large application domain such as human-robot interaction, gaming, surveillance, etc. Action recognition can be implemented in either single-modal (color, depth, skeletal) or multi-modal schemes. Our research focuses on human action recognition based on skeletal information. In literature, a Temporal Hierarchy of Covariance Descriptors on 3D Joints (Cov3DJ) was proposed to exploit information from time dimension of skeletal data. Since each joint has a certain level of engagement in an action, another approach aims at selecting joints which are most informative for action recognition. In this paper, a novel framework, named as Covariance Descriptor on Most Informative Joints (CovMIJ), is proposed to benefit from the simplicity of representation via covariance descriptor and noise immunity by using only Most Informative Joints (MIJ) for action recognition. Extensive experiments between CovMIJ, Cov3DJ and state-of-the-art Res-TCN (Temporal Convolutional Neural Networks with Residual Units) on two public datasets (MSR-Action3D and CMDFALL) show the efficiency of our proposal. On MSR-Action3D dataset, accuracy of CovMIJ achieves 93.6% while that of Cov3DJ is only 90.53%. Improvement is found on CMDFALL dataset with accuracy of 64.72% for CovMIJ compared to 61.34% for Cov3DJ. On CMDFALL dataset, CovMIJ with F1 score of 62.5% outperforms the deep learning network Res-TCN with F1 score of 39.38%.

[1]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[2]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Gang Wang,et al.  Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks , 2017, IEEE Transactions on Image Processing.

[4]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[6]  Cuong Pham,et al.  A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[7]  David Picard,et al.  Learning features combination for human action recognition from skeleton sequences , 2017, Pattern Recognit. Lett..

[8]  Luc Van Gool,et al.  Does Human Action Recognition Benefit from Pose Estimation? , 2011, BMVC.

[9]  Luc Van Gool,et al.  Deep Learning on Lie Groups for Skeleton-Based Action Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  René Vidal,et al.  Moving Poselets: A Discriminative and Interpretable Skeletal Motion Representation for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[11]  Ruzena Bajcsy,et al.  Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Mohamed E. Hussein,et al.  CovP3DJ: Skeleton-parts-based-covariance Descriptor for Human Action Recognition , 2018, VISIGRAPP.

[13]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Marco La Cascia,et al.  3D skeleton-based human action classification: A survey , 2016, Pattern Recognit..

[15]  Austin Reiter,et al.  Interpretable 3D Human Action Analysis with Temporal Convolutional Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.