Hierarchical Dropped Convolutional Neural Network for Speed Insensitive Human Action Recognition

Human action recognition using skeleton data has lots of potential applications in content-based action retrieval and intelligent surveillance, with wide usage of depth sensors and robust skeleton estimation algorithms. Previous methods describe spatial temporal skeleton joints as a compact color image and then use Convolutional Neural Network (CNN) to extract more discriminative deep features. However, these methods ignore the effect of speed variation, which is a common phenomenon and can bring severe intra-varieties to same types of actions. To solve this problem, this paper presents a novel hierarchical dropped CNN architecture, which is constructed in two stages. Dropped CNN (d-CNN) is firstly developed to extract deep features from a probabilistic speed insensitive color image. This image expresses both spatial distributions and temporal evolutions of skeleton joints meanwhile avoids the effect of speed variations. To enhance the temporal discriminative power, we extend d-CNN to a hierarchical structure (h-CNN), where multiple scales of temporal information are encoded. Extensive experiments on benchmark MSRC-12 dataset and the largest NTU RGB+D dataset verify the effectiveness and robustness of the proposed method.

[1]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Min-Chun Hu,et al.  Real-Time Human Movement Retrieval and Assessment With Kinect Sensor , 2015, IEEE Transactions on Cybernetics.

[3]  Koichi Shinoda,et al.  Spectral Graph Skeletons for 3D Action Recognition , 2014, ACCV.

[4]  Alexandros André Chaaraoui,et al.  Evolutionary joint selection to improve human action recognition with RGB-D devices , 2014, Expert Syst. Appl..

[5]  Mohan M. Trivedi,et al.  Joint Angles Similarities and HOG2 for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[7]  Pichao Wang,et al.  Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks , 2016, ACM Multimedia.

[8]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[9]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Patrick van der Smagt,et al.  Two-stream RNN/CNN for action recognition in 3D videos , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Xiaodong Yang,et al.  Super Normal Vector for Human Activity Recognition with Depth Cameras , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Dimitrios Makris,et al.  G3D: A gaming action dataset and real time action recognition evaluation framework , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[14]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Houqiang Li,et al.  Attribute Mining for Scalable 3D Human Action Recognition , 2015, ACM Multimedia.

[16]  Ruzena Bajcsy,et al.  Bio-inspired Dynamic 3D Discriminative Skeletal Features for Human Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Hong Liu,et al.  Enhanced skeleton visualization for view invariant human action recognition , 2017, Pattern Recognit..

[18]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Wei-Shi Zheng,et al.  Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Dimitrios Makris,et al.  Fall detection system using Kinect’s infrared sensor , 2014, Journal of Real-Time Image Processing.

[21]  Wanqing Li,et al.  Discriminative Key Pose Extraction Using Extended LC-KSVD for Action Recognition , 2014, 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[22]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[23]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[24]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yong Du,et al.  Skeleton based action recognition with convolutional neural network , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).