Human action recognition based on point context tensor shape descriptor

Motion trajectory recognition is one of the most important means to determine the identity of a moving object. A compact and discriminative feature representation method can improve the trajectory recognition accuracy. This paper presents an efficient framework for action recognition using a three-dimensional skeleton kinematic joint model. First, we put forward a rotation-scale-translation-invariant shape descriptor based on point context (PC) and the normal vector of hypersurface to jointly characterize local motion and shape information. Meanwhile, an algorithm for extracting the key trajectory based on the confidence coefficient is proposed to reduce the randomness and computational complexity. Second, to decrease the eigenvalue decomposition time complexity, a tensor shape descriptor (TSD) based on PC that can globally capture the spatial layout and temporal order to preserve the spatial information of each frame is proposed. Then, a multilinear projection process is achieved by tensor dynamic time warping to map the TSD to a low-dimensional tensor subspace of the same size. Experimental results show that the proposed shape descriptor is effective and feasible, and the proposed approach obtains considerable performance improvement over the state-of-the-art approaches with respect to accuracy on a public action dataset.

[1]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[2]  Limin Wang,et al.  Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice , 2014, Comput. Vis. Image Underst..

[3]  Fernando De la Torre,et al.  Generalized time warping for multi-modal alignment of human motion , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jin Zhang,et al.  STFC: Spatio-temporal feature chain for skeleton-based human action recognition , 2015, J. Vis. Commun. Image Represent..

[5]  Lijiang Chen,et al.  Trajectory-based view-invariant hand gesture recognition by fusing shape and orientation , 2015, IET Comput. Vis..

[6]  Xiaodong Yang,et al.  Super Normal Vector for Human Activity Recognition with Depth Cameras , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Larry H. Matthies,et al.  First-Person Activity Recognition: Feature, Temporal Structure, and Prediction , 2015, International Journal of Computer Vision.

[8]  Dong Xu,et al.  Multilinear Discriminant Analysis for Face Recognition , 2007, IEEE Transactions on Image Processing.

[9]  Martin Masek,et al.  Joint movement similarities for robust 3D action recognition using skeletal data , 2015, J. Vis. Commun. Image Represent..

[10]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Yan Zhang,et al.  Local Surface Geometric Feature for 3D human action recognition , 2016, Neurocomputing.

[12]  Hui Gao,et al.  Recognizing human action efforts: an adaptive three-mode PCA framework , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13]  Anton van den Hengel,et al.  Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition , 2015, Pattern Recognit..

[14]  Cordelia Schmid,et al.  Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Xingyu Wu,et al.  Point Context: An Effective Shape Descriptor for RST-Invariant Trajectory Recognition , 2016, Journal of Mathematical Imaging and Vision.

[17]  M. Fathy,et al.  Model-based human gait tracking, 3D reconstruction and recognition in uncalibrated monocular video , 2012 .

[18]  Samsu Sempena,et al.  Human action recognition using Dynamic Time Warping , 2011, Proceedings of the 2011 International Conference on Electrical Engineering and Informatics.

[19]  Kai Liu,et al.  Profile HMMs for skeleton-based human action recognition , 2016, Signal Process. Image Commun..

[20]  Gérard G. Medioni,et al.  Structured Time Series Analysis for Human Action Segmentation and Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Haiping Lu,et al.  Gait Recognition Through MPCA Plus LDA , 2006, 2006 Biometrics Symposium: Special Session on Research at the Biometric Consortium Conference.

[22]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[23]  Cordelia Schmid,et al.  A Robust and Efficient Video Representation for Action Recognition , 2015, International Journal of Computer Vision.

[24]  Mineichi Kudo,et al.  Efficient action recognition via local position offset of 3D skeletal body joints , 2015, Multimedia Tools and Applications.

[25]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[27]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Tae-Kyun Kim,et al.  Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Huy Tho Ho,et al.  Curvature-based approach for multi-scale feature extraction from 3D meshes and unstructured point clouds , 2009, DICTA 2009.

[30]  Haiping Lu,et al.  MPCA: Multilinear Principal Component Analysis of Tensor Objects , 2008, IEEE Transactions on Neural Networks.