Triplet attention multiple spacetime-semantic graph convolutional network for skeleton-based action recognition

Skeleton-based action recognition has recently attracted widespread attention in the field of computer vision. Previous studies on skeleton-based action recognition are susceptible to interferences from redundant video frames in judging complex actions but ignore the fact that the spatial-temporal features of different actions are extremely different. To solve these problems, we propose a triplet attention multiple spacetime-semantic graph convolutional network for skeleton-based action recognition (AM-GCN), which can not only capture the multiple spacetime-semantic feature from the video images to avoid limited information diversity from single-layer feature representation but can also improve the generalization ability of the network. We also present the triplet attention mechanism to apply an attention mechanism to different key points, key channels, and key frames of the actions, improving the accuracy and interpretability of the judgement of complex actions. In addition, different kinds of spacetime-semantic feature information are combined through the proposed fusion decision for comprehensive prediction in order to improve the robustness of the algorithm. We validate AM-GCN with two standard datasets, NTU-RGBD and Kinetics, and compare it with other mainstream models. The results show that the proposed model achieves tremendous improvement.

[1]  Francisco Herrera,et al.  Object Detection Binary Classifiers methodology based on deep learning to identify small objects handled similarly: Application in video surveillance , 2020, Knowl. Based Syst..

[2]  Fei Cheng,et al.  Spatio-temporal attention on manifold space for 3D human action recognition , 2020, Applied Intelligence.

[3]  Peng Gao,et al.  Learning Reinforced Attentional Representation for End-to-End Visual Tracking , 2019, Inf. Sci..

[4]  Dacheng Tao,et al.  Graph Edge Convolutional Neural Networks for Skeleton-Based Action Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Chao Li,et al.  Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation , 2018, IJCAI.

[6]  Xiaolin Zhang,et al.  PGCN-TCA: Pseudo Graph Convolutional Network With Temporal and Channel-Wise Attention for Skeleton-Based Action Recognition , 2020, IEEE Access.

[7]  Yifan Zhang,et al.  Skeleton-Based Action Recognition With Multi-Stream Adaptive Graph Convolutional Networks , 2019, IEEE Transactions on Image Processing.

[8]  Yueting Zhuang,et al.  Fusing Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks , 2018, IEEE Transactions on Multimedia.

[9]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[10]  Bing Li,et al.  Graph convolutional network with structure pooling and joint-wise channel attention for action recognition , 2020, Pattern Recognit..

[11]  Song-Chun Zhu,et al.  Learning Human-Object Interactions by Graph Parsing Neural Networks , 2018, ECCV.

[12]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[13]  Yao Lu,et al.  GAIM: Graph Attention Interaction Model for Collective Activity Recognition , 2020, IEEE Transactions on Multimedia.

[14]  Kangjie Li,et al.  Hierarchical graph attention networks for semi-supervised node classification , 2020, Applied Intelligence.

[15]  Gang Wang,et al.  Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[17]  Hanqing Lu,et al.  Skeleton-Based Action Recognition With Gated Convolutional Neural Networks , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Mohammed Bennamoun,et al.  Learning Clip Representations for Skeleton-Based 3D Action Recognition , 2018, IEEE Transactions on Image Processing.

[19]  Shuzhi Sam Ge,et al.  Small traffic sign detection from large image , 2019, Applied Intelligence.

[20]  Hong Liu,et al.  Enhanced skeleton visualization for view invariant human action recognition , 2017, Pattern Recognit..