An extension of kernel learning methods using a modified Log-Euclidean distance for fast and accurate skeleton-based Human Action Recognition

Abstract In this article, we introduce a fast, accurate and invariant method for RGB-D based human action recognition using a Hierarchical Kinematic Covariance (HKC) descriptor. Recently, non singular covariance matrices of pattern features which are elements of the space of Symmetric Definite Positive (SPD) matrices, have been proven to be very efficient descriptors in the field of pattern recognition. However, in the case of action recognition, singular covariance matrices cannot be avoided because the dimension of features could be higher than the number of samples. Such covariance matrices (non singular and singular) belong to the space of Symmetric Positive semi-Definite (SPsD) matrices. Thus, in order to classify actions, we propose to adapt kernel methods such as Support Vector Machines (SVM) and Multiple Kernel Learning (MKL) to the space of SPsD matrices by using a perturbed Log-Euclidean distance (Arsigny et al., 2006). The mathematical validity of this perturbed distance (called Modified Log-Euclidean distance) for SPsD is therefore studied. The offline experiments are conducted on three challenging benchmarks, namely MSRAction3D, UTKinect and Multiview3D datasets. A fair comparison demonstrates that our approach competes with state-of-the-art methods in terms of accuracy and computational latency. Finally, our method is extended to an online scenario and experiments on MSRC12 prove the efficiency of this extension.

[1]  Mohan M. Trivedi,et al.  Joint Angles Similarities and HOG2 for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[2]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[3]  Stéphane Lecoeuche,et al.  A fast and accurate motion descriptor for human action recognition applications , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[4]  Jing Zhang,et al.  Action Recognition From Depth Maps Using Deep Convolutional Neural Networks , 2016, IEEE Transactions on Human-Machine Systems.

[5]  Pichao Wang,et al.  Online human action recognition based on incremental learning of weighted covariance descriptors , 2018, Inf. Sci..

[6]  Nicholas Ayache,et al.  Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices , 2007, SIAM J. Matrix Anal. Appl..

[7]  Stéphane Lecoeuche,et al.  Kinematic Spline Curves: A temporal invariant descriptor for fast action recognition , 2018, Image Vis. Comput..

[8]  Zhi Liu,et al.  3D-based Deep Convolutional Neural Network for action recognition with depth sequences , 2016, Image Vis. Comput..

[9]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[11]  Georgios Evangelidis,et al.  Skeletal Quads: Human Action Recognition Using Joint Quadruples , 2014, 2014 22nd International Conference on Pattern Recognition.

[12]  Xuelong Li,et al.  Gabor-Based Region Covariance Matrices for Face Recognition , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Anuj Srivastava,et al.  Action Recognition Using Rate-Invariant Analysis of Skeletal Shape Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  W. Förstner,et al.  A Metric for Covariance Matrices , 2003 .

[15]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  S. Sra Positive definite matrices and the Symmetric Stein Divergence , 2011 .

[17]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Ehud Rivlin,et al.  Online action recognition using covariance of shape and motion , 2014, Comput. Vis. Image Underst..

[19]  Petros Daras,et al.  Real-Time Skeleton-Tracking-Based Human Action Recognition Using Kinect Data , 2014, MMM.

[20]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Silvere Bonnabel,et al.  Riemannian Metric and Geometric Mean for Positive Semidefinite Matrices of Fixed Rank , 2008, SIAM J. Matrix Anal. Appl..

[22]  Hongdong Li,et al.  Kernel Methods on the Riemannian Manifold of Symmetric Positive Definite Matrices , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Alessia Saggese,et al.  Action recognition by using kernels on aclets sequences , 2016, Comput. Vis. Image Underst..

[24]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[25]  I. J. Schoenberg,et al.  Metric spaces and positive definite functions , 1938 .

[26]  Stéphane Lecoeuche,et al.  3D real-time human action recognition using a spline interpolation approach , 2015, 2015 International Conference on Image Processing Theory, Tools and Applications (IPTA).

[27]  Anton van den Hengel,et al.  Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition , 2015, Pattern Recognit..

[28]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[29]  Anthony Fleury,et al.  Toward a Real Time View-invariant 3D Action Recognition , 2016, VISIGRAPP.

[30]  Eshed Ohn-Bar,et al.  Joint Angles Similiarities and HOG 2 for Action Recognition , 2013 .

[31]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[32]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[33]  Guodong Guo,et al.  Fusing Spatiotemporal Features and Joints for 3D Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[34]  C. Klingenberg Cranial integration and modularity: insights into evolution and development from morphometric data , 2013 .

[35]  Xiaodong Yang,et al.  Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Ling Shao,et al.  From handcrafted to learned representations for human action recognition: A survey , 2016, Image Vis. Comput..

[38]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[39]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[40]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Stéphane Lecoeuche,et al.  Incremental and Decremental Multi-category Classification by Support Vector Machines , 2009, 2009 International Conference on Machine Learning and Applications.

[42]  Pichao Wang,et al.  Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[43]  N. Ayache,et al.  Log‐Euclidean metrics for fast and simple calculus on diffusion tensors , 2006, Magnetic resonance in medicine.

[44]  Arif Mahmood,et al.  Histogram of Oriented Principal Components for Cross-View Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..