论文信息 - Ensemble One-Dimensional Convolution Neural Networks for Skeleton-Based Action Recognition

Ensemble One-Dimensional Convolution Neural Networks for Skeleton-Based Action Recognition

This letter proposes an ensemble neural network (Ensem-NN) for skeleton-based action recognition. The Ensem-NN is introduced based on the idea of ensemble learning, “two heads are better than one.” According to the property of skeleton sequences, we design one-dimensional convolution neural network with residual structure as Base-Net. From entirety to local, from focus to motion, we designed four different subnets based on the Base-Net to extract diverse features. The first subnet is a Two-stream Entirety Net , which performs on the entirety skeleton and explores both temporal and spatial features. The second is a Body-part Net, which can extract fine-grained spatial and temporal features. The third is an Attention Net, in which a channel-wised attention mechanism can learn important frames and feature channels. Frame-difference Net, as the fourth subnet, aims at exploring motion features. Finally, the four subnets are fused as one ensemble network. Experimental results show that the proposed Ensem-NN performs better than state-of-the-art methods on three widely used datasets.

[1] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[2] Arun Ross,et al. Score normalization in multimodal biometric systems , 2005, Pattern Recognit..

[3] R. Schapire. The Strength of Weak Learnability , 1990, Machine Learning.

[4] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[5] Jake K. Aggarwal,et al. View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7] Hanqing Lu,et al. Fusing multi-modal features for gesture recognition , 2013, ICMI '13.

[8] Guodong Guo,et al. Fusing Spatiotemporal Features and Joints for 3D Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[9] Massimo Piccardi,et al. Joint Action Segmentation and Classification by an Extended Hidden Markov Model , 2013, IEEE Signal Processing Letters.

[10] Marwan Torki,et al. Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[11] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[12] Wanqing Li,et al. Discriminative Key Pose Extraction Using Extended LC-KSVD for Action Recognition , 2014, 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[13] Yong Du,et al. Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Rushil Anirudh,et al. Elastic functional coding of human actions: From vector-fields to latent variables , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Yong Du,et al. Skeleton based action recognition with convolutional neural network , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[16] Chong-Wah Ngo,et al. Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling , 2015, IEEE Transactions on Image Processing.

[17] Nasser Kehtarnavaz,et al. UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[18] Gang Wang,et al. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Fang Liu,et al. Simple to Complex Transfer Learning for Action Recognition , 2016, IEEE Transactions on Image Processing.

[21] Xiaohui Xie,et al. Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[22] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Alan L. Yuille,et al. Mining 3D Key-Pose-Motifs for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Pichao Wang,et al. Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks , 2016, ACM Multimedia.

[25] Rama Chellappa,et al. Cross-View Action Recognition via Transferable Dictionary Learning , 2016, IEEE Transactions on Image Processing.

[26] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[27] Hongsong Wang,et al. Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Gang Wang,et al. Global Context-Aware Attention LSTM Networks for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Yan Song,et al. Multi-part boosting LSTMS for skeleton based human activity analysis , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[30] Nanning Zheng,et al. View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31] Austin Reiter,et al. Interpretable 3D Human Action Analysis with Temporal Convolutional Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32] Mohammed Bennamoun,et al. SkeletonNet: Mining Deep Part Features for 3-D Action Recognition , 2017, IEEE Signal Processing Letters.

[33] Sanghoon Lee,et al. Ensemble Deep Learning for Skeleton-Based Action Recognition Using Temporal Sliding LSTM Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34] Xinyu Wu,et al. The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences , 2017, Knowl. Based Syst..

[35] Yi Lin,et al. Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[36] Pichao Wang,et al. Joint Distance Maps Based Action Recognition With Convolutional Neural Networks , 2017, IEEE Signal Processing Letters.

[37] Shuang Wang,et al. Skeleton-based action recognition using LSTM and CNN , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[38] Chao Li,et al. Skeleton-based action recognition with convolutional neural networks , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[39] Xiaoming Liu,et al. On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[40] Mohammed Bennamoun,et al. A New Representation of Skeleton Sequences for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Pichao Wang,et al. Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[42] Dapeng Tao,et al. Skeleton embedded motion body partition for human action recognition using depth sequences , 2018, Signal Process..

[43] Gang Wang,et al. Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] Tae Soo Kim,et al. Interpretable 3 D Human Action Analysis with Temporal Convolutional Networks , 2018 .

[45] Jian Liu,et al. Skepxels: Spatio-temporal Image Representation of Human Skeleton Joints for Action Recognition , 2017, CVPR Workshops.