论文信息 - A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition

A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition

This paper presents a new framework for human action recognition from a 3D skeleton sequence. Previous studies do not fully utilize the temporal relationships between video segments in a human action. Some studies successfully used very deep Convolutional Neural Network (CNN) models but often suffer from the data insufficiency problem. In this study, we first segment a skeleton sequence into distinct temporal segments in order to exploit the correlations between them. The temporal and spatial features of a skeleton sequence are then extracted simultaneously by utilizing a fine-to-coarse (F2C) CNN architecture optimized for human skeleton sequences. We evaluate our proposed method on NTU RGB+D and SBU Kinect Interaction dataset. It achieves 79.6% and 84.6% of accuracies on NTU RGB+D with cross-object and cross-view protocol, respectively, which are almost identical with the state-of-the-art performance. In addition, our method significantly improves the accuracy of the actions in two-person interactions.

Koichi Shinoda | Nakamasa Inoue | Thao Le Minh | Nakamasa Inoue | K. Shinoda

[1] Günther Palm,et al. Learning convolutional neural networks from few samples , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[2] Koichi Shinoda,et al. Graph regularized implicit pose for 3D human action recognition , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[3] Tae Soo Kim,et al. Interpretable 3 D Human Action Analysis with Temporal Convolutional Networks , 2018 .

[4] Hairong Qi,et al. Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps , 2013, 2013 IEEE International Conference on Computer Vision.

[5] Bhaskara Marthi,et al. A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs , 2017, Science.

[6] Wenjun Zeng,et al. An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[7] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9] Hong Liu,et al. Enhanced skeleton visualization for view invariant human action recognition , 2017, Pattern Recognit..

[10] Austin Reiter,et al. Interpretable 3D Human Action Analysis with Temporal Convolutional Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11] Gang Wang,et al. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Xiaohui Xie,et al. Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[13] Rama Chellappa,et al. Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Michael I. Jordan,et al. Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[15] Jiaying Liu,et al. Temporal Perceptive Network for Skeleton-Based Action Recognition , 2017, BMVC.

[16] Xiaodong Yang,et al. Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[17] Ying Wu,et al. Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Marco La Cascia,et al. 3D skeleton-based human action classification: A survey , 2016, Pattern Recognit..

[19] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20] Mohammed Bennamoun,et al. SkeletonNet: Mining Deep Part Features for 3-D Action Recognition , 2017, IEEE Signal Processing Letters.

[21] Fei Han,et al. Space-Time Representation of People Based on 3D Skeletal Data: A Review , 2016, Comput. Vis. Image Underst..

[22] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[23] Hong Liu,et al. 3 D ACTION RECOGNITION USING DATA VISUALIZATION AND CONVOLUTIONAL NEURAL NETWORKS , 2017 .

[24] Gang Wang,et al. Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26] Yong Du,et al. Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[28] Gang Wang,et al. Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks , 2017, IEEE Transactions on Image Processing.

[29] Dimitris Samaras,et al. Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[30] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[31] Hsieh Hou,et al. Cubic splines for image interpolation and digital filtering , 1978 .

[32] Cristian Sminchisescu,et al. The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[33] Hong Liu,et al. 3D action recognition using data visualization and convolutional neural networks , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[34] Xiaoming Liu,et al. On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[35] Nanning Zheng,et al. View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36] Dit-Yan Yeung,et al. A Regularization Approach to Learning Task Relationships in Multitask Learning , 2014, ACM Trans. Knowl. Discov. Data.

[37] Mohammed Bennamoun,et al. A New Representation of Skeleton Sequences for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Gang Wang,et al. Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.