Grassmannian Sparse Representations and Motion Depth Surfaces for 3D Action Recognition

Manifold learning has been effectively used in computer vision applications for dimensionality reduction that improves classification performance and reduces computational load. Grassmann manifolds are well suited for computer vision problems because they promote smooth surfaces where points are represented as subspaces. In this paper we propose Grassmannian Sparse Representations (GSR), a novel subspace learning algorithm that combines the benefits of Grassmann manifolds with sparse representations using least squares loss L1-norm minimization for optimal classification. We further introduce a new descriptor that we term Motion Depth Surface (MDS) and compare its classification performance against the traditional Motion History Image (MHI) descriptor. We demonstrate the effectiveness of GSR on computationally intensive 3D action sequences from the Microsoft Research 3D-Action and 3D-Gesture datasets.

[1]  Joachim Denzler,et al.  Analyzing the Subspaces Obtained by Dimensionality Reduction for Human Action Recognition from 3d Data , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[2]  Yaakov Tsaig,et al.  Fast Solution of $\ell _{1}$ -Norm Minimization Problems When the Solution May Be Sparse , 2008, IEEE Transactions on Information Theory.

[3]  Brian C. Lovell,et al.  Clustering on Grassmann manifolds via kernel embedding with application to action analysis , 2012, 2012 19th IEEE International Conference on Image Processing.

[4]  Bo Zhang,et al.  General image classification based on sparse representation , 2010, 9th IEEE International Conference on Cognitive Informatics (ICCI'10).

[5]  Guangfeng Lin,et al.  Human Action Recognition Using Latent-Dynamic Condition Random Fields , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[6]  Yuemin Zhu,et al.  Sparse representation based MRI denoising with total variation , 2008, 2008 9th International Conference on Signal Processing.

[7]  Fuji Ren,et al.  Detect and track the dynamic deformation human body with the active shape model modified by motion vectors , 2011, 2011 IEEE International Conference on Cloud Computing and Intelligence Systems.

[8]  Brian C. Lovell,et al.  Graph embedding discriminant analysis on Grassmannian manifolds for improved image set matching , 2011, CVPR 2011.

[9]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[10]  James W. Davis,et al.  The Representation and Recognition of Action Using Temporal Templates , 1997, CVPR 1997.

[11]  Ci Wang,et al.  Noisy image super-resolution with sparse mixing estimators , 2011, 2011 4th International Congress on Image and Signal Processing.

[12]  R. Vidal,et al.  Intrinsic mean shift for clustering on Stiefel and Grassmann manifolds , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[14]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Daniel D. Lee,et al.  Grassmann discriminant analysis: a unifying view on subspace-based learning , 2008, ICML '08.

[16]  Shuicheng Yan,et al.  Visual classification with multi-task joint sparse representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Marios Savvides,et al.  The multifactor extension of Grassmann manifolds for face recognition , 2011, Face and Gesture 2011.

[18]  Jian-Huang Lai,et al.  Supervised Neighborhood Topology Learning for Human Action Recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[19]  Rama Chellappa,et al.  Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Li-Chen Fu,et al.  On-line human action recognition by combining joint tracking and key pose recognition , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[22]  Nassir Navab,et al.  Estimating human 3D pose from Time-of-Flight images based on geodesic distances and optical flow , 2011, Face and Gesture 2011.

[23]  P. Absil,et al.  Riemannian Geometry of Grassmann Manifolds with a View on Algorithmic Computation , 2004 .

[24]  Honghai Liu,et al.  Human action recognition based on 3D SIFT and LDA model , 2011, 2011 IEEE Workshop on Robotic Intelligence In Informationally Structured Space.

[25]  Kazufumi Kaneda,et al.  Face sequence recognition using Grassmann Distances and Grassmann Kernels , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[26]  Massimo Piccardi,et al.  HMM-MIO: An enhanced hidden Markov model for action recognition , 2011, CVPR 2011 WORKSHOPS.

[27]  Z. Liu,et al.  A real time system for dynamic hand gesture recognition with a depth sensor , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[28]  Yasushi Makihara,et al.  Inverse Dynamics for Action Recognition , 2013, IEEE Transactions on Cybernetics.

[29]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  John M. Lee Introduction to Smooth Manifolds , 2002 .