论文信息 - Skeletal Movement to Color Map: A Novel Representation for 3D Action Recognition with Inception Residual Networks

Skeletal Movement to Color Map: A Novel Representation for 3D Action Recognition with Inception Residual Networks

We propose a novel skeleton-based representation for 3D action recognition in videos using Deep Convolutional Neural Networks (D-CNNs). Two key issues have been addressed: First, how to construct a robust representation that easily captures the spatial-temporal evolutions of motions from skeleton sequences. Second, how to design D-CNNs capable of learning discriminative features from the new representation in a effective manner. To address these tasks, a skeleton-based representation, namely, SPMF (Skeleton Pose-Motion Feature) is proposed. The SPMFs are built from two of the most important properties of a human action: postures and their motions. Therefore, they are able to effectively represent complex actions. For learning and recognition tasks, we design and optimize new D-CNNs based on the idea of Inception Residual networks to predict actions from SPMFs. Our method is evaluated on two challenging datasets including MSR Action3D and NTU-RGB+D. Experimental results indicated that the proposed method surpasses state-of-the-art methods whilst requiring less computation.

Louahdi Khoudour | Sergio A. Velastin | Alain Crouzil | Pablo Zegers | Huy-Hieu Pham

[1] Yong Du,et al. Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Shuang Wang,et al. Skeleton-based action recognition using LSTM and CNN , 2017, 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[4] Wanqing Li,et al. Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[5] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[6] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[7] Ying Wu,et al. Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Bing Li,et al. Graph Based Skeleton Motion Representation and Similarity Measurement for Action Recognition , 2016, ECCV.

[9] Junsong Yuan,et al. Spatio-Temporal Naive-Bayes Nearest-Neighbor (ST-NBNN) for Skeleton-Based Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[11] Wenjun Zeng,et al. An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[12] BlakeAndrew,et al. Real-time human pose recognition in parts from single depth images , 2013 .

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Tara N. Sainath,et al. Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] Mohammed Bennamoun,et al. Learning Action Recognition Model from Depth and Skeleton Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16] Jian Sun,et al. Convolutional neural networks at constrained time cost , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Wei Liang,et al. Discriminative human action recognition in the learned hierarchical manifold space , 2010, Image Vis. Comput..

[18] Ronald Poppe,et al. A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[19] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20] Gang Wang,et al. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Jian-Huang Lai,et al. Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Ramakant Nevatia,et al. Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[23] Nasser Kehtarnavaz,et al. Real-time human action recognition based on depth motion maps , 2013, Journal of Real-Time Image Processing.

[24] Rama Chellappa,et al. Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[26] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Xiaoming Liu,et al. On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[29] Ling Guan,et al. Spatio-Temporal Pyramid Model based on depth maps for action recognition , 2015, 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP).