论文信息 - Skeleton-Based Square Grid for Human Action Recognition With 3D Convolutional Neural Network

Skeleton-Based Square Grid for Human Action Recognition With 3D Convolutional Neural Network

Convolutional neural networks (CNNs) can effectively handle grid-structured data but not dynamic skeletons, which are usually expressed as graph structures. In this study, we first propose a skeleton-based square grid (SSG) for transforming dynamic skeletons into three-dimensional (3D) grid-structured data so that CNNs can be applied to such data. Each SSG contains a joint-based square grid (JSG) and a rigid-based square grid (RSG) based on intrinsic and extrinsic dependencies of various body parts, respectively. Next, to enhance the ability of deep features to capture the correlations among 3D grid-structured data, a two-stream 3D CNN is constructed to learn spatiotemporal features using the JSG and RSG sequences. Finally, we introduce a soft attention model that selectively focuses on the informative body parts in the skeleton sequences. We validate our model in terms of action recognition using three datasets: NTU RGB+D, Kinetics Motion, and SBU Kinect Interaction datasets. Our experimental results demonstrate the effectiveness of the proposed approach as well as its superior performance when compared with those of state-of-the-art methods.

Wenwen Ding | Chongyang Ding | Kai Liu | Guang Li

[1] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Spatio-Temporal Image Representation of 3D Skeletal Movements for View-Invariant Action Recognition with Deep Convolutional Neural Networks , 2019 .

[3] Xiaohui Xie,et al. Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[4] Jian Liu,et al. Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition , 2017, CVPR Workshops.

[5] Jian-Huang Lai,et al. Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Zahra Gharaee. Hierarchical growing grid networks for skeleton based action recognition , 2020, Cognitive Systems Research.

[7] Mohammed Bennamoun,et al. A New Representation of Skeleton Sequences for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Dit-Yan Yeung,et al. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[10] Gang Wang,et al. Global Context-Aware Attention LSTM Networks for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Alex Graves,et al. Supervised Sequence Labelling , 2012 .

[12] Nanning Zheng,et al. View Adaptive Neural Networks for High Performance Skeleton-Based Human Action Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Dimitris Samaras,et al. Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[14] Xavier Bresson,et al. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[15] Dacheng Tao,et al. Graph Edge Convolutional Neural Networks for Skeleton-Based Action Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[16] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .

[17] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.

[18] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[19] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20] Hanqing Lu,et al. Body Joint Guided 3-D Deep Convolutional Descriptors for Action Recognition , 2018, IEEE Transactions on Cybernetics.

[21] Razvan Pascanu,et al. How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[22] Mohammed Bennamoun,et al. Learning Clip Representations for Skeleton-Based 3D Action Recognition , 2018, IEEE Transactions on Image Processing.

[23] Richard M. Murray,et al. A Mathematical Introduction to Robotic Manipulation , 1994 .

[24] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[25] Jiebo Luo,et al. Action Recognition With Spatio–Temporal Visual Attention on Skeleton Image Sequences , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[26] Joshua Goodman,et al. Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[27] Wenwen Ding,et al. Global relational reasoning with spatial temporal graph interaction networks for skeleton-based action recognition , 2020, Signal Process. Image Commun..

[28] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[29] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[30] Dahua Lin,et al. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[31] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32] Wenjun Zeng,et al. An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[33] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[34] Tinne Tuytelaars,et al. Modeling video evolution for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Lin Gao,et al. Graph CNNs with Motif and Variable Temporal Block for Skeleton-Based Action Recognition , 2019, AAAI.

[36] Wei Chu,et al. Multi-category Classification by Soft-Max Combination of Binary Classifiers , 2003, Multiple Classifier Systems.

[37] Lei Shi,et al. Adaptive Spectral Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, ArXiv.

[38] Louahdi Khoudour,et al. A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera , 2019, Sensors.

[39] Gang Wang,et al. Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[40] Raja Parasuraman,et al. Attention, biological motion, and action recognition , 2012, NeuroImage.

[41] Austin Reiter,et al. Interpretable 3D Human Action Analysis with Temporal Convolutional Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[42] Yong Du,et al. Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Gang Wang,et al. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Hong Cheng,et al. Interactive body part contrast mining for human interaction recognition , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[45] Gang Hua,et al. Attention-based Temporal Weighted Convolutional Neural Network for Action Recognition , 2018, AIAI.

[46] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[47] Mooi Choo Chuah,et al. Category-Blind Human Action Recognition: A Practical Recognition System , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).