Action Recognition from 3D Skeleton Sequences using Deep Networks on Lie Group Features

This paper addresses the problem of human action recognition from sequences of 3D skeleton data. For this purpose, we combine a deep learning network with geometric features extracted from data lie on a non-Euclidean space, which have been recently shown to be very effective to capture the geometric structure of the human pose. In particular, our approach claims to incorporate the intrinsic nature of the data characterized by Lie Group into deep neural networks and to learn more adequate geometric features for 3D action recognition problem. First, geometric features are extracted from 3D joints of skeleton sequences using the Lie group representation. Then, the network model is built from stacked units of 1-dimensional CNN across the temporal domain. Finally, CNN-features are then used to train an LSTM layer to model dependencies in the temporal domain, and to perform the action recognition. The experimental evaluation is performed on three public datasets containing various challenges: UT-Kinect, Florence 3D-Action and MSR-Action 3D. Results reveal that our approach achieves most of the state-of-the-art performance.

[1]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Rama Chellappa,et al.  Rolling Rotations for Recognizing Human Actions from 3D Skeletal Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Anuj Srivastava,et al.  Accurate 3D action recognition using learning on the Grassmann manifold , 2015, Pattern Recognit..

[5]  Anoop Cherian,et al.  Tensor Representations via Kernel Linearization for Action Recognition from 3D Skeletons , 2016, ECCV.

[6]  Alberto Del Bimbo,et al.  Submitted to Ieee Transactions on Cybernetics 1 3d Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold , 2022 .

[7]  Guo-Jun Qi,et al.  Differential Recurrent Neural Networks for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Alberto Del Bimbo,et al.  Combined shape analysis of human poses and motion units for action segmentation and recognition , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[9]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Alan L. Yuille,et al.  Mining 3D Key-Pose-Motifs for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Rushil Anirudh,et al.  Elastic Functional Coding of Riemannian Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Tara N. Sainath,et al.  Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[16]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[17]  B. Hall Lie Groups, Lie Algebras, and Representations: An Elementary Introduction , 2004 .

[18]  Mohammed Bennamoun,et al.  A New Representation of Skeleton Sequences for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Alberto Del Bimbo,et al.  Recognizing Actions from Depth Cameras as Weakly Aligned Multi-part Bag-of-Poses , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[20]  David Picard,et al.  Learning features combination for human action recognition from skeleton sequences , 2017, Pattern Recognit. Lett..

[21]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[22]  Luc Van Gool,et al.  Deep Learning on Lie Groups for Skeleton-Based Action Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiaoming Liu,et al.  On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[24]  Hazem Wannous,et al.  Grassmannian Representation of Motion Depth for 3D Human Gesture and Action Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[25]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[26]  Sanghoon Lee,et al.  Ensemble Deep Learning for Skeleton-Based Action Recognition Using Temporal Sliding LSTM Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).