Accurate 3D action recognition using learning on the Grassmann manifold

In this paper we address the problem of modeling and analyzing human motion by focusing on 3D body skeletons. Particularly, our intent is to represent skeletal motion in a geometric and efficient way, leading to an accurate action-recognition system. Here an action is represented by a dynamical system whose observability matrix is characterized as an element of a Grassmann manifold. To formulate our learning algorithm, we propose two distinct ideas: (1) in the first one we perform classification using a Truncated Wrapped Gaussian model, one for each class in its own tangent space. (2) In the second one we propose a novel learning algorithm that uses a vector representation formed by concatenating local coordinates in tangent spaces associated with different classes and training a linear SVM. We evaluate our approaches on three public 3D action datasets: MSR-action 3D, UT-kinect and UCF-kinect datasets; these datasets represent different kinds of challenges and together help provide an exhaustive evaluation. The results show that our approaches either match or exceed state-of-the-art performance reaching 91.21% on MSR-action 3D, 97.91% on UCF-kinect, and 88.5% on UT-kinect. Finally, we evaluate the latency, i.e. the ability to recognize an action before its termination, of our approach and demonstrate improvements relative to other published approaches. HighlightsA human action recognition approach which represents skeletal sequence as point on the Grassmann manifold.A new learning algorithm is introduced for learning human actions.Experiments are performed on three public datasets.Promising success rates are achieved, showing accuracy and better latency performances.

[1]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[2]  Brian C. Lovell,et al.  Clustering on Grassmann manifolds via kernel embedding with application to action analysis , 2012, 2012 19th IEEE International Conference on Image Processing.

[3]  BlakeAndrew,et al.  Real-time human pose recognition in parts from single depth images , 2013 .

[4]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[5]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[6]  Brian C. Lovell,et al.  Kernel analysis on Grassmann manifolds for action recognition , 2013, Pattern Recognit. Lett..

[7]  Andreas E. Savakis,et al.  Grassmannian sparse representations , 2015, J. Electronic Imaging.

[8]  Hong Wei,et al.  A survey of human motion analysis using depth imagery , 2013, Pattern Recognit. Lett..

[9]  Peter H. N. de With,et al.  Automatic video-based human motion analyzer for consumer surveillance system , 2009, IEEE Transactions on Consumer Electronics.

[10]  Gérard G. Medioni,et al.  Structured Time Series Analysis for Human Action Segmentation and Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[12]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Bingbing Ni,et al.  RGBD-HuDaAct: A color-depth video database for human daily activity recognition , 2011, ICCV Workshops.

[14]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Gérard G. Medioni,et al.  Dynamic Manifold Warping for view invariant action recognition , 2011, 2011 International Conference on Computer Vision.

[16]  Qing Zhang,et al.  A Survey on Human Motion Analysis from Depth Data , 2013, Time-of-Flight and Depth Imaging.

[17]  Rama Chellappa,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Matching Shape Sequences in Video with Applications in Human Movement Analysis. Ieee Transactions on Pattern Analysis and Machine Intelligence 2 , 2022 .

[18]  Rama Chellappa,et al.  Silhouette-based gesture and action recognition via modeling trajectories on Riemannian shape manifolds , 2011, Comput. Vis. Image Underst..

[19]  Mohammad H. Mahoor,et al.  Human activity recognition using multi-features and multiple kernel learning , 2014, Pattern Recognit..

[20]  BoyerEdmond,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011 .

[21]  Rama Chellappa,et al.  Locally time-invariant models of human activities using trajectories on the grassmannian , 2009, CVPR.

[22]  Moritz Tenorth,et al.  The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[23]  Rama Chellappa,et al.  Locally time-invariant models of human activities using trajectories on the grassmannian , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Sergio Escalera,et al.  Probability-Based Dynamic Time Warping for Gesture Recognition on RGB-D Data , 2012, WDIA.

[25]  Yui Man Lui,et al.  Tangent Bundles on Special Manifolds for Action Recognition , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Tae-Seong Kim,et al.  Depth video-based human activity recognition system using translation and scaling invariant features for life logging at smart home , 2012, IEEE Transactions on Consumer Electronics.

[27]  Janusz Konrad,et al.  Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels , 2010, ICPR Contests.

[28]  Alberto Del Bimbo,et al.  Space-Time Pose Representation for 3D Human Action Recognition , 2013, ICIAP Workshops.

[29]  Anuj Srivastava,et al.  Shape Analysis of Elastic Curves in Euclidean Spaces , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Sergio Escalera,et al.  Featureweighting in dynamic timewarping for gesture recognition in depth data , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[32]  Licheng Jiao,et al.  Manifold-constrained coding and sparse representation for human action recognition , 2013, Pattern Recognit..

[33]  J. Ross Beveridge,et al.  Tangent bundle for human action recognition , 2011, Face and Gesture 2011.

[34]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[35]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[36]  J. Fowler INTERNATIONAL CONFERENCE ON IMAGE PROCESSING , 1995, Proceedings., International Conference on Image Processing.

[37]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[38]  Shaogang Gong,et al.  Fusing appearance and distribution information of interest points for action recognition , 2012, Pattern Recognit..

[39]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[40]  Ahmed M. Elgammal,et al.  Modeling View and Posture Manifolds for Tracking , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[41]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[42]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[43]  Andreas E. Savakis,et al.  A spatiotemporal descriptor based on radial distances and 3D joint tracking for action classification , 2012, 2012 19th IEEE International Conference on Image Processing.

[44]  Mathieu Barnachon,et al.  Ongoing human action recognition with motion capture , 2014, Pattern Recognit..

[45]  Bruce A. Draper,et al.  Using a Product Manifold distance for unsupervised action recognition , 2012, Image Vis. Comput..

[46]  Changyin Sun,et al.  Action recognition using linear dynamic systems , 2013, Pattern Recognit..

[47]  Stefano Soatto,et al.  Recognition of human gaits , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[48]  Luc Van Gool,et al.  2D Action Recognition Serves 3D Human Pose Estimation , 2010, ECCV.

[49]  Jake K. Aggarwal,et al.  Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  George Loizou,et al.  Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[51]  Samsu Sempena,et al.  Human action recognition using Dynamic Time Warping , 2011, Proceedings of the 2011 International Conference on Electrical Engineering and Informatics.

[52]  Toni Giorgino,et al.  Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package , 2009 .

[53]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[54]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[55]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[56]  Anuj Srivastava,et al.  Statistical Modeling of Curves Using Shapes and Related Features , 2012 .

[57]  Rama Chellappa,et al.  Differential geometric representations and algorithms for some pattern recognition and computer vision problems , 2014, Pattern Recognit. Lett..

[58]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[59]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[60]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[61]  Yui Man Lui,et al.  Advances in matrix manifolds for computer vision , 2012, Image Vis. Comput..

[62]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[63]  Rama Chellappa,et al.  Statistical Computations on Grassmann and Stiefel Manifolds for Image and Video-Based Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.