Si-GCN: Structure-induced Graph Convolution Network for Skeleton-based Action Recognition

In recent years, the graph-convolution networks have been used to solve the problem of skeleton-based action recognition. Previous works often adopted a structure-fixed graph to model the physical joints of human skeleton, but cannot well consider these interactions of different human parts (e.g., the right arm and the left leg) to some extent. To deal with this problem, we propose a novel structure-induced graph convolution network (Si-GCN) framework to boost the performance of the skeleton-based action recognition task. Given a video sequence of human skeletons, the Si-GCN can produce the sample-wise category in an end-to-end way. Specifically, according to the natural divisions of human body, we define a collection of intra-part graphs for each input human skeleton (i.e., each graph denotes a specific part/global of human skeleton), and then formulate an inter-graph to model the relationships of different intra-part graphs. The Si-GCN framework, which will then perform the spectral graph convolutions on these constructed intra/inter-part graphs, can not only capture the internal modalities of each human part/subgraph, but also consider the interactions/relationships between different human parts. A temporal convolution follows to model the temporal and spatial dynamics of the skeleton in combination with the characteristics of time and space. Comprehensive evaluations on two public datasets (including NTU RGB+D and HDM05) well demonstrate the superiority of our proposed Si-GCN when compared with existing skeleton-based action recognition approaches.

[1]  Chao Li,et al.  Skeleton-based action recognition with convolutional neural networks , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[2]  Hema Swetha Koppula,et al.  Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Marco La Cascia,et al.  3D skeleton-based human action classification: A survey , 2016, Pattern Recognit..

[4]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[5]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[6]  Jian Yang,et al.  Spatio-Temporal Graph Convolution for Skeleton Based Action Recognition , 2018, AAAI.

[7]  Mohammed Bennamoun,et al.  A New Representation of Skeleton Sequences for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  James M. Rehg,et al.  Movement Pattern Histogram for Action Recognition and Retrieval , 2014, ECCV.

[9]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[10]  Mingyi He,et al.  3D skeleton based action recognition by video-domain translation-scale invariant mapping and multi-scale dilated CNN , 2018, Multimedia Tools and Applications.

[11]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[13]  Jake K. Aggarwal,et al.  Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me? , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[14]  Jian Yang,et al.  Gaussian-Induced Convolution for Graphs , 2018, AAAI.

[15]  Pichao Wang,et al.  Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks , 2016, ACM Multimedia.

[16]  Hossein Ragheb,et al.  MuHAVi: A Multicamera Human Action Video Dataset for the Evaluation of Action Recognition Methods , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[17]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[18]  Yi Lin,et al.  Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[19]  Nanning Zheng,et al.  View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Thi-Lan Le,et al.  Novel Skeleton-based Action Recognition Using Covariance Descriptors on Most Informative Joints , 2018, 2018 10th International Conference on Knowledge and Systems Engineering (KSE).

[22]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yong Du,et al.  Skeleton based action recognition with convolutional neural network , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[24]  Bing Li,et al.  Graph Based Skeleton Motion Representation and Similarity Measurement for Action Recognition , 2016, ECCV.

[25]  Jian Yang,et al.  Action-Attending Graphic Neural Network , 2017, IEEE Transactions on Image Processing.

[26]  Xavier Bresson,et al.  Structured Sequence Modeling with Graph Convolutional Recurrent Networks , 2016, ICONIP.

[27]  Pichao Wang,et al.  Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Luc Van Gool,et al.  A Riemannian Network for SPD Matrix Learning , 2016, AAAI.

[29]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[30]  Pichao Wang,et al.  Action Recognition Based on Joint Trajectory Maps with Convolutional Neural Networks , 2018, Knowl. Based Syst..

[31]  Wenjun Zeng,et al.  Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection , 2018, IEEE Transactions on Image Processing.

[32]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Jiaying Liu,et al.  PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding , 2017, ArXiv.

[34]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[35]  Shuicheng Yan,et al.  Multi-loss Regularized Deep Neural Network , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  Luc Van Gool,et al.  Deep Learning on Lie Groups for Skeleton-Based Action Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Tido Röder,et al.  Documentation Mocap Database HDM05 , 2007 .

[38]  Mohsen Ramezani,et al.  A review on human action analysis in videos for retrieval applications , 2016, Artificial Intelligence Review.

[39]  Gang Wang,et al.  Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Wenjun Zeng,et al.  Online Human Action Detection using Joint Classification-Regression Recurrent Neural Networks , 2016, ECCV.

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[43]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.