论文信息 - Graph regularized implicit pose for 3D human action recognition

Graph regularized implicit pose for 3D human action recognition

We present a novel feature descriptor for 3D human action recognition using graph signal processing techniques. A linear subspace is learned using graph total variation and graph Tikhonov regularizers, transforming 3D time derivative information into a representation that is robust against noisy skeleton measurements. The graph total variation regularizer learns an action representation that encourages piece-wise constantness, which helps discriminating between different action classes. Graph Tikhonov regularization ensures the searched low-rank subspace is similar to the original feature. Experiments show that our approach learns a good representation of an action due to the explicit graph structure, and achieves a statistically significant improvement over the baseline moving pose method, resulting in a 93.5% accuracy on the challenging MSRAction3D dataset.

[1] Wanqing Li,et al. Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[2] Fei Han,et al. Space-Time Representation of People Based on 3D Skeletal Data: A Review , 2016, Comput. Vis. Image Underst..

[3] Ying Wu,et al. Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Hairong Qi,et al. Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps , 2013, 2013 IEEE International Conference on Computer Vision.

[5] Pascal Frossard,et al. Learning of structured graph dictionaries , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[7] Pierre Vandergheynst,et al. PCA using graph total variation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Ying Wu,et al. Learning Maximum Margin Temporal Warping for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[9] Koichi Shinoda,et al. Spectral Graph Skeletons for 3D Action Recognition , 2014, ACCV.

[10] Zicheng Liu,et al. HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Jonathan Goldstein,et al. When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[12] Ling Shao,et al. Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[13] Cristian Sminchisescu,et al. The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[14] Pascal Frossard,et al. Parametric dictionary learning for graph signals , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[15] Jin Tang,et al. Graph-Laplacian PCA: Closed-Form Solution and Robustness , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Eamonn J. Keogh. Nearest Neighbor , 2010, Encyclopedia of Machine Learning.

[17] Laurent Condat,et al. A Primal–Dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms , 2012, Journal of Optimization Theory and Applications.