Global relational reasoning with spatial temporal graph interaction networks for skeleton-based action recognition

Abstract With the prevalence of accessible depth sensors, dynamic skeletons have attracted much attention as a robust modality for action recognition. Convolutional neural networks (CNNs) excel at modeling local relations within local receptive fields and are typically inefficient at capturing global relations. In this article, we first view the dynamic skeletons as a spatio-temporal graph (STG) and then learn the localized correlated features that generate the embedded nodes of the STG by message passing. To better extract global relational information, a novel model called spatial–temporal graph interaction networks (STG-INs) is proposed, which perform long-range temporal modeling of human body parts. In this model, human body parts are mapped to an interaction space where graph-based reasoning can be efficiently implemented via a graph convolutional network (GCN). After reasoning, global relation-aware features are distributed back to the embedded nodes of the STG. To evaluate our model, we conduct extensive experiments on three large-scale datasets. The experimental results demonstrate the effectiveness of our proposed model, which achieves the state-of-the-art performance.

[1]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[2]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[3]  Gang Wang,et al.  Feature Boosting Network For 3D Pose Estimation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[7]  Jiayi Luo,et al.  Skeleton-based action recognition by part-aware graph convolutional networks , 2019, The Visual Computer.

[8]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[10]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[11]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[12]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[13]  Wei Chu,et al.  Multi-category Classification by Soft-Max Combination of Binary Classifiers , 2003, Multiple Classifier Systems.

[14]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[15]  Mark Tygert,et al.  A Mathematical Motivation for Complex-Valued Convolutional Networks , 2015, Neural Computation.

[16]  Hong Cheng,et al.  Interactive body part contrast mining for human interaction recognition , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[17]  Wei Hu,et al.  Generalized Graph Convolutional Networks for Skeleton-based Action Recognition , 2018, ArXiv.

[18]  Gang Wang,et al.  Skeleton-Based Online Action Prediction Using Scale Selection Network , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[20]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[21]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[22]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[23]  Dong-Seong Kim,et al.  Image representation of pose-transition feature for 3D skeleton-based action recognition , 2020, Inf. Sci..

[24]  Austin Reiter,et al.  Interpretable 3D Human Action Analysis with Temporal Convolutional Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Jian Yang,et al.  Spatio-Temporal Graph Convolution for Skeleton Based Action Recognition , 2018, AAAI.

[26]  Mohammed Bennamoun,et al.  A New Representation of Skeleton Sequences for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[28]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[29]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[30]  Jiebo Luo,et al.  Action Recognition With Spatio–Temporal Visual Attention on Skeleton Image Sequences , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[32]  Kai Liu,et al.  Tensor-based linear dynamical systems for action recognition from 3D skeletons , 2018, Pattern Recognit..

[33]  Dacheng Tao,et al.  Graph Edge Convolutional Neural Networks for Skeleton-Based Action Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Kai Liu,et al.  Profile HMMs for skeleton-based human action recognition , 2016, Signal Process. Image Commun..

[35]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Lei Shi,et al.  Adaptive Spectral Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, ArXiv.

[37]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[38]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.