Grassmannian Representation of Motion Depth for 3D Human Gesture and Action Recognition

Recently developed commodity depth sensors open up new possibilities of dealing with rich descriptors, which capture geometrical features of the observed scene. Here, we propose an original approach to represent geometrical features extracted from depth motion space, which capture both geometric appearance and dynamic of human body simultaneously. In this approach, sequence features are modeled temporally as subspaces lying on the Grassmann manifold. Classification task is carried out via computation of probability density functions on tangent space of each class tacking benefit from the geometric structure of the Grassmann manifold. The experimental evaluation is performed on three existing datasets containing various challenges, including MSR-action 3D, UT-kinect and MSR-Gesture3D. Results reveal that our approach outperforms the state-of-the-art methods, with accuracy of 98.21% on MSR-Gesture3D and 95.25% on UT-kinect, and achieves a competitive performance of 86.21% on MSR-action 3D.

[1]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[2]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[3]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[4]  No Value,et al.  IEEE International Conference on Image Processing , 2003 .

[5]  Hassen Drira,et al.  3D Face Recognition under Expressions, Occlusions, and Pose Variations , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Tae-Seong Kim,et al.  Depth video-based human activity recognition system using translation and scaling invariant features for life logging at smart home , 2012, IEEE Transactions on Consumer Electronics.

[7]  Rama Chellappa,et al.  Statistical Computations on Grassmann and Stiefel Manifolds for Image and Video-Based Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Alberto Del Bimbo,et al.  Space-Time Pose Representation for 3D Human Action Recognition , 2013, ICIAP Workshops.

[9]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[10]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[11]  Z. Liu,et al.  A real time system for dynamic hand gesture recognition with a depth sensor , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[12]  Andreas E. Savakis,et al.  A spatiotemporal descriptor based on radial distances and 3D joint tracking for action classification , 2012, 2012 19th IEEE International Conference on Image Processing.

[13]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[14]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[16]  SrivastavaAnuj,et al.  3D Face Recognition under Expressions, Occlusions, and Pose Variations , 2013 .

[17]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[18]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[19]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[20]  J. Ross Beveridge,et al.  Tangent bundle for human action recognition , 2011, Face and Gesture 2011.

[21]  Rama Chellappa,et al.  Locally time-invariant models of human activities using trajectories on the grassmannian , 2009, CVPR.

[22]  James M. Keller,et al.  Histogram of Oriented Normal Vectors for Object Recognition with a Depth Sensor , 2012, ACCV.

[23]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Peter H. N. de With,et al.  Automatic video-based human motion analyzer for consumer surveillance system , 2009, IEEE Transactions on Consumer Electronics.