Adaptive most joint selection and covariance descriptions for a robust skeleton-based human action recognition

In this paper, we propose two effective manners of utilizing skeleton data for human action recognition (HAR). The proposed method on one hand takes advantage of the skeleton data thanks to their robustness to human appearance change as well as the real-time performance. On the other hand, it avoids inherent drawbacks of the skeleton data such as noises, incorrect human skeleton estimation due to self-occlusion of human pose. To this end, in terms of feature designing, we propose to extract covariance descriptors from joint velocity and combine them with those of joint position. In terms of 3-D skeleton-based activity representation, we propose two schemes to select the most informative joints. The proposed method is evaluated on two benchmark datasets. On the MSRAction-3D dataset, the proposed method outperformed different hand-designed features-based methods. On the challenging dataset CMDFall, the proposed method significantly improves accuracy when compared with techniques based on recent neuronal networks. Finally, we investigate the robustness of the proposed method via a cross-dataset evaluation.

[1]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[2]  Hichem Snoussi,et al.  Abnormal event detection based on analysis of movement information of video sequence , 2018 .

[3]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[4]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[5]  Shuang Wang,et al.  A Review on Human Activity Recognition Using Vision-Based Method , 2017, Journal of healthcare engineering.

[6]  Lei Shi,et al.  Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Austin Reiter,et al.  Interpretable 3D Human Action Analysis with Temporal Convolutional Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  Rama Chellappa,et al.  Rolling Rotations for Recognizing Human Actions from 3D Skeletal Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Vincenzo Deufemia,et al.  Mining relaxed functional dependencies from data , 2019, Data Mining and Knowledge Discovery.

[10]  Paulo Cortez,et al.  Automatic visual detection of human behavior: A review from 2000 to 2014 , 2015, Expert Syst. Appl..

[11]  Antonio Fernández-Caballero,et al.  A survey of video datasets for human action and activity recognition , 2013, Comput. Vis. Image Underst..

[12]  Tao Lei,et al.  A review of Convolutional-Neural-Network-based action recognition , 2019, Pattern Recognit. Lett..

[13]  Alexander C. Berg,et al.  Combining multiple sources of knowledge in deep CNNs for action recognition , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[14]  Mohamed E. Hussein,et al.  CovP3DJ: Skeleton-parts-based-covariance Descriptor for Human Action Recognition , 2018, VISIGRAPP.

[15]  Thi-Lan Le,et al.  3D skeleton-based action recognition with convolutional neural networks , 2019, 2019 International Conference on Multimedia Analysis and Pattern Recognition (MAPR).

[16]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  David Picard,et al.  Learning features combination for human action recognition from skeleton sequences , 2017, Pattern Recognit. Lett..

[18]  Thi-Lan Le,et al.  Novel Skeleton-based Action Recognition Using Covariance Descriptors on Most Informative Joints , 2018, 2018 10th International Conference on Knowledge and Systems Engineering (KSE).

[19]  Vasile-Marian Scuturici,et al.  Evaluating Classification Feasibility Using Functional Dependencies , 2020, Trans. Large Scale Data Knowl. Centered Syst..

[20]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[22]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[23]  Ling Shao,et al.  Action Recognition From Arbitrary Views Using Transferable Dictionary Learning , 2018, IEEE Transactions on Image Processing.

[24]  Howard J. Hamilton,et al.  Mining functional dependencies from data , 2007, Data Mining and Knowledge Discovery.

[25]  Wenwen Ding,et al.  Skeleton-Based Human Action Recognition via Screw Matrices , 2017 .

[26]  Giuseppe Polese,et al.  EDCAR: A knowledge representation framework to enhance automatic video surveillance , 2019, Expert Syst. Appl..

[27]  Marco La Cascia,et al.  3D skeleton-based human action classification: A survey , 2016, Pattern Recognit..

[28]  Ruzena Bajcsy,et al.  Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, CVPR Workshops.

[29]  Jianfei Cai,et al.  Efficient object feature selection for action recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Deng Cai,et al.  Tracking people in RGBD videos using deep learning and motion clues , 2016, Neurocomputing.

[31]  Giuseppe Polese,et al.  Relaxed Functional Dependencies—A Survey of Approaches , 2016, IEEE Transactions on Knowledge and Data Engineering.

[32]  Shuang Wang,et al.  Skeleton-based action recognition using LSTM and CNN , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[33]  Qing Lei,et al.  A Comprehensive Survey of Vision-Based Human Action Recognition Methods , 2019, Sensors.

[34]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Chen Chen,et al.  Memory Attention Networks for Skeleton-Based Action Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Lo PrestiLiliana,et al.  3D skeleton-based human action classification , 2016 .

[37]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[38]  Ehud Rivlin,et al.  Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[39]  Tieniu Tan,et al.  Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning , 2018, ECCV.

[40]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Liang Li,et al.  Adaptive Feature Selection With Reinforcement Learning for Skeleton-Based Action Recognition , 2020, IEEE Access.

[42]  David Dagan Feng,et al.  Discriminative two-level feature selection for realistic human action recognition , 2013, J. Vis. Commun. Image Represent..

[43]  Lei Wang,et al.  Beyond Covariance: Feature Representation with Nonlinear Kernel Matrices , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Jun Kong,et al.  Informative joints based human action recognition using skeleton contexts , 2015, Signal Process. Image Commun..

[45]  Liang Wang,et al.  Richly Activated Graph Convolutional Network for Action Recognition with Incomplete Skeletons , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[46]  Cuong Pham,et al.  A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[47]  Yingli Tian,et al.  Monocular human pose estimation: A survey of deep learning-based methods , 2020, Comput. Vis. Image Underst..

[48]  Dario Maio,et al.  A multimodal approach for human activity recognition based on skeleton and RGB data , 2020, Pattern Recognit. Lett..

[49]  Fei Han,et al.  Space-Time Representation of People Based on 3D Skeletal Data: A Review , 2016, Comput. Vis. Image Underst..

[50]  K. A. Joshi,et al.  A Survey on Moving Object Detection and Tracking in Video Surveillance System , 2012 .

[51]  Jing Li,et al.  Hierarchically Learned View-Invariant Representations for Cross-View Action Recognition , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[52]  Thi-Lan Le,et al.  Analyzing Role of Joint Subset Selection in Human Action Recognition , 2019, 2019 6th NAFOSTED Conference on Information and Computer Science (NICS).