Multi-part boosting LSTMS for skeleton based human activity analysis

This paper presents the methods we adopt for ICME2017 large scale 3D human activity analysis challenge in depth videos. There are two tasks including the segmented activity recognition and the untrimmed activity detection. We adopt a network structure comprised of two LSTM layers, a fully connected layer and a softmax layer. For the recognition task, we investigate several schemes to handle the inconsistency of subject number of different activities. For the detection task, we trim multi-scale segment parts from untrimmed videos and adopt a proposal network followed by a classification network.

[1]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[2]  Shih-Fu Chang,et al.  Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[4]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Greg Mori,et al.  A Hierarchical Deep Temporal Model for Group Activity Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).