Online human action recognition based on incremental learning of weighted covariance descriptors

Abstract Different from traditional action recognition based on video segments, online action recognition aims to recognize actions from an unsegmented stream of data in a continuous manner. One approach to online recognition is based on accumulation of evidence over time. This paper presents an effective framework of such an approach to online action recognition from a stream of noisy skeleton data, using a weighted covariance descriptor as a means to accumulate information. In particular, a fast incremental updating method for the weighted covariance descriptor is developed. The weighted covariance descriptor takes the following principles into consideration: past frames have less contribution to the accumulated evidence and recent and informative frames such as key frames contribute more. To determine the discriminativeness of each frame, an effective pseudo-neutral pose is proposed to recover the neutral pose from an arbitrary pose in a frame. Two recognition methods are developed using the weighted covariance descriptor. The first method applies nearest neighbor search in a set of trained actions using a Riemannian metric of covariance matrices. The second method uses a Log-Euclidean kernel based SVM. Extensive experiments on MSRC-12 Kinect Gesture dataset, Online RGBD Action dataset, and our newly collected online action recognition dataset have demonstrated the efficacy of the proposed framework in terms of latency, miss rate and error rate.

[1]  S. Sra Positive definite matrices and the Symmetric Stein Divergence , 2011 .

[2]  Hongbin Zha,et al.  Tracking Generic Human Motion via Fusion of Low- and High-Dimensional Approaches , 2013, IEEE Trans. Syst. Man Cybern. Syst..

[3]  Shu Wang,et al.  A framework of mining semantic-based probabilistic event relations for complex activity recognition , 2017, Inf. Sci..

[4]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[5]  Lei Wang,et al.  Beyond Covariance: Feature Representation with Nonlinear Kernel Matrices , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Ling Shao,et al.  Action recognition by spatio-temporal oriented energies , 2014, Inf. Sci..

[7]  Mario Fernando Montenegro Campos,et al.  Online gesture recognition from pose kernel learning and decision forests , 2014, Pattern Recognit. Lett..

[8]  Luming Zhang,et al.  Action2Activity: Recognizing Complex Activities from Sensor Data , 2015, IJCAI.

[9]  Sung-Kee Park,et al.  A multi-temporal framework for high-level activity analysis: Violent event detection in visual surveillance , 2018, Inf. Sci..

[10]  Pichao Wang,et al.  Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks , 2016, ACM Multimedia.

[11]  Jake K. Aggarwal,et al.  Human activity recognition from 3D data: A review , 2014, Pattern Recognit. Lett..

[12]  Xiaodong Yang,et al.  Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Xin Zhao,et al.  Structured Streaming Skeleton -- A New Feature for Online Human Gesture Recognition , 2014, TOMM.

[14]  Shengping Zhang,et al.  Action recognition based on overcomplete independent components analysis , 2014, Inf. Sci..

[15]  Nicholas Ayache,et al.  Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices , 2007, SIAM J. Matrix Anal. Appl..

[16]  Giorgio Metta,et al.  Keep it simple and sparse: real-time action recognition , 2013, J. Mach. Learn. Res..

[17]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Jing Zhang,et al.  ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring , 2015, ACM Multimedia.

[19]  Ehud Rivlin,et al.  Online action recognition using covariance of shape and motion , 2014, Comput. Vis. Image Underst..

[20]  Ser-Nam Lim,et al.  Adaptive RNN Tree for Large-Scale Human Action Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Hongdong Li,et al.  Kernel Methods on the Riemannian Manifold of Symmetric Positive Definite Matrices , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Rama Chellappa,et al.  Key Frame-Based Activity Representation Using Antieigenvalues , 2006, ACCV.

[23]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Mehrtash Tafazzoli Harandi,et al.  From Manifold to Manifold: Geometry-Aware Dimensionality Reduction for SPD Matrices , 2014, ECCV.

[25]  Arif Mahmood,et al.  HOPC: Histogram of Oriented Principal Components of 3D Pointclouds for Action Recognition , 2014, ECCV.

[26]  N. Ayache,et al.  Log‐Euclidean metrics for fast and simple calculus on diffusion tensors , 2006, Magnetic resonance in medicine.

[27]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[28]  Georgios Evangelidis,et al.  Continuous Action Recognition Based on Sequence Alignment , 2014, International Journal of Computer Vision.

[29]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Q. M. Jonathan Wu,et al.  Incremental Learning in Human Action Recognition Based on Snippets , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Basilio Sierra,et al.  Kinetic Pseudo-energy History for Human Dynamic Gestures Recognition , 2008, AMDO.

[32]  Li Liu,et al.  Recognizing Complex Activities by a Probabilistic Interval-Based Model , 2016, AAAI.

[33]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[34]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[35]  Jing Zhang,et al.  Action Recognition From Depth Maps Using Deep Convolutional Neural Networks , 2016, IEEE Transactions on Human-Machine Systems.

[36]  Hong Liu,et al.  Enhanced skeleton visualization for view invariant human action recognition , 2017, Pattern Recognit..

[37]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[38]  David S. Rosenblum,et al.  From action to activity: Sensor-based activity recognition , 2016, Neurocomputing.

[39]  Li Bai,et al.  Real-Time Probabilistic Covariance Tracking With Efficient Model Update , 2012, IEEE Transactions on Image Processing.

[40]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[41]  Lei Wang,et al.  Learning Discriminative Stein Kernel for SPD Matrices and Its Applications , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[43]  Kai Liu,et al.  Tensor-based linear dynamical systems for action recognition from 3D skeletons , 2018, Pattern Recognit..

[44]  Leonid Sigal,et al.  Poselet Key-Framing: A Model for Human Activity Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Dimitrios Makris,et al.  Hierarchical transfer learning for online recognition of compound actions , 2016, Comput. Vis. Image Underst..

[46]  Eun-Soo Kim,et al.  Hierarchical topic modeling with pose-transition feature for action recognition using 3D skeleton data , 2018, Inf. Sci..

[47]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Amit K. Roy-Chowdhury,et al.  Continuous Learning of Human Activity Models Using Deep Nets , 2014, ECCV.

[49]  Alberto Del Bimbo,et al.  Submitted to Ieee Transactions on Cybernetics 1 3d Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold , 2022 .

[50]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[51]  Mario Fernando Montenegro Campos,et al.  On the improvement of human action recognition from depth map sequences using Space-Time Occupancy Patterns , 2014, Pattern Recognit. Lett..

[52]  Brian C. Lovell,et al.  Spatio-temporal covariance descriptors for action and gesture recognition , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[53]  Ling Shao,et al.  Learning Discriminative Key Poses for Action Recognition , 2013, IEEE Transactions on Cybernetics.

[54]  Brian C. Lovell,et al.  Kernel analysis over Riemannian manifolds for visual recognition of actions, pedestrians and textures , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[55]  I. Dryden,et al.  Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging , 2009, 0910.1656.

[56]  Haroon Idrees,et al.  Online Localization and Prediction of Actions and Interactions , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Juan Song,et al.  An Online Continuous Human Action Recognition Algorithm Based on the Kinect Sensor , 2016, Sensors.

[58]  Guolong Chen,et al.  Human action recognition via multi-task learning base on spatial-temporal feature , 2015, Inf. Sci..

[59]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Ilaria Gori,et al.  Online Action Recognition via Nonparametric Incremental Learning , 2014, BMVC.

[61]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Louahdi Khoudour,et al.  Exploiting deep residual networks for human action recognition from skeletal data , 2018, Comput. Vis. Image Underst..

[63]  Bo Hu,et al.  Discriminative Action States Discovery for Online Action Recognition , 2016, IEEE Signal Processing Letters.

[64]  Gang Wang,et al.  Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks , 2017, IEEE Transactions on Image Processing.

[65]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[66]  Bowen Zhang,et al.  Real-Time Action Recognition With Deeply Transferred Motion Vector CNNs , 2018, IEEE Transactions on Image Processing.

[67]  G. Price,et al.  Extension of covariance selection mathematics , 1972, Annals of human genetics.

[68]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[69]  Gang Yu,et al.  Discriminative Orderlet Mining for Real-Time Recognition of Human-Object Interaction , 2014, ACCV.

[70]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.