3D action recognition using multi-temporal skeleton visualization

Action recognition using depth sequences plays important role in many fields, e.g., intelligent surveillance, content-based video retrieval. Real applications require robust and accurate action recognition method. In this paper, we propose a skeleton visualization method, which efficiently encodes the spatial-temporal information of skeleton joints into a set of color images. These images are served as inputs for convolutional neural networks to extract more discriminative deep features. To enhance the ability of deep features to capture global relationships, we extend the color images into multi-temporal version. Additionally, to solve the effect of view point changes, a spatial transform method is adopted as a preprocessing step. Extensive experiments on NTU RGB+D dataset and ICME2017 challenge show that our method can accurately distinguish similar actions and shows robustness to view variations.

[1]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Andrea Vedaldi,et al.  Dynamic Image Networks for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jing Zhang,et al.  ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring , 2015, ACM Multimedia.

[4]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[6]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[9]  Hong Liu,et al.  3D Action Recognition Using Multiscale Energy-Based Global Ternary Image , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[11]  Guo-Jun Qi,et al.  Differential Recurrent Neural Networks for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Xiaodong Yang,et al.  Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[13]  Hong Liu,et al.  3D action recognition using data visualization and convolutional neural networks , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[14]  Hong Liu,et al.  Depth Context: a new descriptor for human activity recognition by using sole depth sequences , 2016, Neurocomputing.

[15]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[16]  Hong Liu,et al.  Enhanced skeleton visualization for view invariant human action recognition , 2017, Pattern Recognit..

[17]  Hong Liu,et al.  Learning directional co-occurrence for human action classification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Ruzena Bajcsy,et al.  Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[19]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[20]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).