Multi-stream slowFast graph convolutional networks for skeleton-based action recognition

Abstract Recently, many efforts have been made to model spatial–temporal features from human skeleton for action recognition by using graph convolutional networks (GCN). Skeleton sequence can precisely represent human pose with a small number of joints while there is still a lot of redundancies across the skeleton sequence in the term of temporal dependency. In order to improve the effectiveness of spatial–temporal feature extraction from skeleton sequence, a SlowFast graph convolution network (SF-GCN) is proposed by implementing the architecture of SlowFast network, which is consisted of the Fast and Slow pathway, in the GCN model. The Fast pathway is a temporal attention embedded lightweight GCN for extracting the feature of fast temporal changes from the skeleton sequence with a high frame rate and fast refreshing speed. The Slow pathway is a spatial attention embedded GCN for extracting the feature of slow temporal changes from the skeleton sequence with a low frame rate and slow refreshing speed. The features of two pathways are fused by using lateral connection and weighted by using channel attention. Based on the aforementioned design, SF-GCN can achieve superior ability of feature extraction while the computational cost significantly drops. In addition to the coordinate information of joints, five high order sequences including edge, the spatial difference and temporal difference of joints and edges are induced to enhance the representation of human action. Six SF-GCNs are implemented for extracting spatial–temporal feature from six kinds of sequences and fused for skeleton-based action recognition, which is called multi-stream SlowFast graph convolutional networks (MSSF-GCN). Extensive experiments are conducted to evaluate the proposed method on three skeleton-based action recognition databases including NTU RGB + D, NTU RGB + D 120, and Skeleton-Kinetics. The results show that the proposed method is effective for skeleton-based action recognition and can achieve the recognition accuracy with an obvious advantage in comparison with the state-of-the-art.

[1]  Boxin Shi,et al.  Context-Aware Cross-Attention for Skeleton-Based Human Action Recognition , 2020, IEEE Access.

[2]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[3]  Yiming Hu,et al.  Convolutional relation network for skeleton-based action recognition , 2019, Neurocomputing.

[4]  Tong Zhang,et al.  Si-GCN: Structure-induced Graph Convolution Network for Skeleton-based Action Recognition , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[5]  Xu Chen,et al.  Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Tao Lei,et al.  A review of Convolutional-Neural-Network-based action recognition , 2019, Pattern Recognit. Lett..

[7]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Yifan Zhang,et al.  Skeleton-Based Action Recognition With Shift Graph Convolutional Network , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Nanning Zheng,et al.  Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Gang Wang,et al.  NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Dacheng Tao,et al.  Graph Edge Convolutional Neural Networks for Skeleton-Based Action Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[13]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[14]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[15]  Kurt Keutzer,et al.  Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Petros Daras,et al.  Motion analysis: Action detection, recognition and evaluation based on motion capture data , 2018, Pattern Recognit..

[17]  Jiaying Liu,et al.  Optimized Skeleton-based Action Recognition via Sparsified Graph Regression , 2018, ACM Multimedia.

[18]  Tieniu Tan,et al.  An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Hong Liu,et al.  Sample Fusion Network: An End-to-End Data Augmentation Network for Skeleton-Based Human Action Recognition , 2019, IEEE Transactions on Image Processing.

[20]  Lei Shi,et al.  Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Fei Wu,et al.  Spatio-Temporal Graph Routing for Skeleton-Based Action Recognition , 2019, AAAI.

[22]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[23]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Mohammed Bennamoun,et al.  Learning Clip Representations for Skeleton-Based 3D Action Recognition , 2018, IEEE Transactions on Image Processing.

[25]  Wenwen Ding,et al.  Global relational reasoning with spatial temporal graph interaction networks for skeleton-based action recognition , 2020, Signal Process. Image Commun..

[26]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[27]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[28]  Lei Shi,et al.  Skeleton-Based Action Recognition With Directed Graph Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Mohammed Bennamoun,et al.  SkeletonNet: Mining Deep Part Features for 3-D Action Recognition , 2017, IEEE Signal Processing Letters.

[30]  Jitendra Malik,et al.  SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Bjorn Ottersten,et al.  Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatio-Temporal Graph Convolutional Network for Action Recognition , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).

[32]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[33]  Ezzeddine Zagrouba,et al.  Abnormal behavior recognition for intelligent video surveillance systems: A review , 2018, Expert Syst. Appl..

[34]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[35]  Joan Bruna,et al.  Deep Convolutional Networks on Graph-Structured Data , 2015, ArXiv.

[36]  M Naveenkumar,et al.  Deep ensemble network using distance maps and body part features for skeleton based action recognition , 2020, Pattern Recognit..

[37]  Liang Wang,et al.  Beyond Joints: Learning Representations From Primitive Geometries for Skeleton-Based Action Recognition and Detection , 2018, IEEE Transactions on Image Processing.

[38]  Austin Reiter,et al.  Interpretable 3D Human Action Analysis with Temporal Convolutional Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39]  Gang Wang,et al.  Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks , 2017, IEEE Transactions on Image Processing.

[40]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Hong Liu,et al.  Enhanced skeleton visualization for view invariant human action recognition , 2017, Pattern Recognit..