Graph regularized implicit pose for 3D human action recognition

We present a novel feature descriptor for 3D human action recognition using graph signal processing techniques. A linear subspace is learned using graph total variation and graph Tikhonov regularizers, transforming 3D time derivative information into a representation that is robust against noisy skeleton measurements. The graph total variation regularizer learns an action representation that encourages piece-wise constantness, which helps discriminating between different action classes. Graph Tikhonov regularization ensures the searched low-rank subspace is similar to the original feature. Experiments show that our approach learns a good representation of an action due to the explicit graph structure, and achieves a statistically significant improvement over the baseline moving pose method, resulting in a 93.5% accuracy on the challenging MSRAction3D dataset.

[1]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[2]  Fei Han,et al.  Space-Time Representation of People Based on 3D Skeletal Data: A Review , 2016, Comput. Vis. Image Underst..

[3]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Hairong Qi,et al.  Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps , 2013, 2013 IEEE International Conference on Computer Vision.

[5]  Pascal Frossard,et al.  Learning of structured graph dictionaries , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[7]  Pierre Vandergheynst,et al.  PCA using graph total variation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Ying Wu,et al.  Learning Maximum Margin Temporal Warping for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Koichi Shinoda,et al.  Spectral Graph Skeletons for 3D Action Recognition , 2014, ACCV.

[10]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[12]  Ling Shao,et al.  Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[13]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Pascal Frossard,et al.  Parametric dictionary learning for graph signals , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[15]  Jin Tang,et al.  Graph-Laplacian PCA: Closed-Form Solution and Robustness , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Eamonn J. Keogh Nearest Neighbor , 2010, Encyclopedia of Machine Learning.

[17]  Laurent Condat,et al.  A Primal–Dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms , 2012, Journal of Optimization Theory and Applications.