A Novel Geometric Framework on Gram Matrix Trajectories for Human Behavior Understanding

In this paper, we propose a novel space-time geometric representation of human landmark configurations and derive tools for comparison and classification. We model the temporal evolution of landmarks as parametrized trajectories on the Riemannian manifold of positive semidefinite matrices of fixed-rank. Our representation has the benefit to bring naturally a second desirable quantity when comparing shapes–the spatial covariance–in addition to the conventional affine-shape representation. We derived then geometric and computational tools for rate-invariant analysis and adaptive re-sampling of trajectories, grounding on the Riemannian geometry of the underlying manifold. Specifically, our approach involves three steps: (1) landmarks are first mapped into the Riemannian manifold of positive semidefinite matrices of fixed-rank to build time-parameterized trajectories; (2) a temporal warping is performed on the trajectories, providing a geometry-aware (dis-)similarity measure between them; (3) finally, a pairwise proximity function SVM is used to classify them, incorporating the (dis-)similarity measure into the kernel function. We show that such representation and metric achieve competitive results in applications as action recognition and emotion recognition from 3D skeletal data, and facial expression recognition from videos. Experiments have been conducted on several publicly available up-to-date benchmarks.

[1]  Wei-Shi Zheng,et al.  Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Zhi Liu,et al.  3D-based Deep Convolutional Neural Network for action recognition with depth sequences , 2016, Image Vis. Comput..

[4]  Rama Chellappa,et al.  Towards view-invariant expression analysis using analytic shape manifolds , 2011, Face and Gesture 2011.

[5]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Binlong Li,et al.  Cross-view activity recognition using Hankelets , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Qingshan Liu,et al.  Learning active facial patches for expression analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Martial Hebert,et al.  Trajectons: Action recognition through the motion analysis of tracked features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[9]  Rama Chellappa,et al.  Rolling Rotations for Recognizing Human Actions from 3D Skeletal Data , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Matti Pietikäinen,et al.  Facial expression recognition from near-infrared videos , 2011, Image Vis. Comput..

[11]  P. Absil,et al.  Riemannian Geometry of Grassmann Manifolds with a View on Algorithmic Computation , 2004 .

[12]  Thomas Philip Runarsson,et al.  Support vector machines and dynamic time warping for time series , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[13]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[14]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[15]  N. Higham Computing the polar decomposition with applications , 1986 .

[16]  Fatih Murat Porikli,et al.  Pedestrian Detection via Classification on Riemannian Manifolds , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Alan L. Yuille,et al.  Mining 3D Key-Pose-Motifs for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  M. Pantic,et al.  Induced Disgust , Happiness and Surprise : an Addition to the MMI Facial Expression Database , 2010 .

[19]  Julie Grèzes,et al.  The Combined Role of Motion-Related Cues and Upper Body Posture for the Expression of Emotions during Human Walking , 2013 .

[20]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[21]  Silvere Bonnabel,et al.  Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach , 2010, J. Mach. Learn. Res..

[22]  Stefanos Zafeiriou,et al.  Incremental Face Alignment in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[24]  Michael Werman,et al.  Affine Invariance Revisited , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Anuj Srivastava,et al.  Action Recognition Using Rate-Invariant Analysis of Skeletal Shape Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Guo-Jun Qi,et al.  Differential Recurrent Neural Networks for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Silvere Bonnabel,et al.  Riemannian Metric and Geometric Mean for Positive Semidefinite Matrices of Fixed Rank , 2008, SIAM J. Matrix Anal. Appl..

[28]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[29]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[30]  Jake K. Aggarwal,et al.  Facial expression recognition with temporal modeling of shapes , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[31]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Vittorio Murino,et al.  Kernelized covariance for action recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[34]  Xiaohui Xie,et al.  Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks , 2016, AAAI.

[35]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[36]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Marco Cuturi,et al.  Fast Global Alignment Kernels , 2011, ICML.

[38]  Junmo Kim,et al.  Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Deli Zhao,et al.  Sparse Coding and Dictionary Learning with Linear Dynamical Systems , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Rushil Anirudh,et al.  Elastic Functional Coding of Riemannian Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Alberto Del Bimbo,et al.  Emotion Recognition by Body Movement Representation on the Manifold of Symmetric Positive Definite Matrices , 2017, ICIAP.

[42]  Mohamed Daoudi,et al.  A Novel Space-Time Representation on the Positive Semidefinite Cone for Facial Expression Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Anuj Srivastava,et al.  Statistical analysis of trajectories on Riemannian manifolds: Bird migration, hurricane tracking and video surveillance , 2014, 1405.0803.

[44]  Shiguang Shan,et al.  Deeply Learning Deformable Facial Action Parts Model for Dynamic Expression Analysis , 2014, ACCV.

[45]  Gang Wang,et al.  Real-Time RGB-D Activity Prediction by Soft Regression , 2016, ECCV.

[46]  Klaus Obermayer,et al.  Classification on Pairwise Proximity Data , 1998, NIPS.

[47]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Stefanos Zafeiriou,et al.  Joint Unsupervised Deformable Spatio-Temporal Alignment of Sequences , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Mohammed Bennamoun,et al.  A spatio-temporal RBM-based model for facial expression recognition , 2016, Pattern Recognit..

[50]  Anuj Srivastava,et al.  Accurate 3D action recognition using learning on the Grassmann manifold , 2015, Pattern Recognit..

[51]  Yin Wang,et al.  Efficient Temporal Sequence Comparison and Classification Using Gram Matrix Embeddings on a Riemannian Manifold , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Yiannis Kompatsiaris,et al.  Deep Learning Advances in Computer Vision with 3D Data , 2017, ACM Comput. Surv..

[53]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[54]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[55]  Mehrtash Tafazzoli Harandi,et al.  Image set classification by symmetric positive semi-definite matrices , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[56]  Shiguang Shan,et al.  Learning Expressionlets on Spatio-temporal Manifold for Dynamic Facial Expression Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[58]  Bart Vandereycken,et al.  Embedded geometry of the set of symmetric positive semidefinite matrices of fixed rank , 2009, 2009 IEEE/SP 15th Workshop on Statistical Signal Processing.

[59]  N. Ayache,et al.  Log‐Euclidean metrics for fast and simple calculus on diffusion tensors , 2006, Magnetic resonance in medicine.

[60]  Xiaoming Liu,et al.  On Geometric Features for Skeleton-Based Action Recognition Using Multilayer LSTM Networks , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[61]  Mubarak Shah,et al.  Covariance of Motion and Appearance Featuresfor Spatio Temporal Recognition Tasks , 2016, ArXiv.

[62]  K. Nomizu,et al.  Foundations of Differential Geometry , 1963 .

[63]  Mohammed Bennamoun,et al.  SkeletonNet: Mining Deep Part Features for 3-D Action Recognition , 2017, IEEE Signal Processing Letters.

[64]  Rama Chellappa,et al.  Statistical Computations on Grassmann and Stiefel Manifolds for Image and Video-Based Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Sergio Escalera,et al.  Support vector machines with time series distance kernels for action classification , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[66]  Pavan K. Turaga,et al.  Shape Distributions of Nonlinear Dynamical Systems for Video-Based Inference , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[68]  Alberto Del Bimbo,et al.  Recognizing Actions from Depth Cameras as Weakly Aligned Multi-part Bag-of-Poses , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[69]  Andrea Cavallaro,et al.  Learning Bases of Activity for Facial Expression Recognition , 2017, IEEE Transactions on Image Processing.

[70]  Rama Chellappa,et al.  Silhouette-based gesture and action recognition via modeling trajectories on Riemannian shape manifolds , 2011, Comput. Vis. Image Underst..

[71]  Qiang Ji,et al.  Capturing Complex Spatio-temporal Relations among Facial Muscles for Facial Expression Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Sergio Escalera,et al.  RGB-D-based Human Motion Recognition with Deep Learning: A Survey , 2017, Comput. Vis. Image Underst..

[73]  Hongdong Li,et al.  Kernel Methods on Riemannian Manifolds with Gaussian RBF Kernels , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  Alberto Del Bimbo,et al.  Submitted to Ieee Transactions on Cybernetics 1 3d Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold , 2022 .

[75]  Lei Wang,et al.  Beyond Covariance: Feature Representation with Nonlinear Kernel Matrices , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[76]  Sudeep Sarkar,et al.  Rate-Invariant Analysis of Trajectories on Riemannian Manifolds with Application in Visual Speech Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[77]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[78]  Stefano Berretti,et al.  Representation, Analysis, and Recognition of 3D Humans , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[79]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[80]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[81]  Tamás D. Gedeon,et al.  Emotion recognition in the wild challenge (EmotiW) challenge and workshop summary , 2013, ICMI '13.