Human action recognition using Pose-based discriminant embedding

Manifold learning is an efficient approach for recognizing human actions. Most of the previous embedding methods are learned based on the distances between frames as data points. Thus they may be efficient in the frame recognition framework, but they will not guarantee to give optimum results when sequences are to be classified as in the case of action recognition in which temporal constraints convey important information. In the sequence recognition framework, sequences are compared based on the distances defined between sets of points. Among them Spatio-temporal Correlation Distance (SCD) is an efficient measure for comparing ordered sequences. In this paper we propose a novel embedding which is optimum in the sequence recognition framework based on SCD as the distance measure. Specifically, the proposed embedding minimizes the sum of the distances between intra-class sequences while seeking to maximize the sum of distances between inter-class points. Action sequences are represented by key poses chosen equidistantly from one action period. The action period is computed by a modified correlation-based method. Action recognition is achieved by comparing the projected sequences in the low-dimensional subspace using SCD or Hausdorff distance in a nearest neighbor framework. Several experiments are carried out on three popular datasets. The method is shown not only to classify the actions efficiently obtaining results comparable to the state of the art on all datasets, but also to be robust to additive noise and tolerant to occlusion, deformation and change in view point. Moreover, the method outperforms other classical dimension reduction techniques and performs faster by choosing less number of postures.

[1]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[2]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Wei Liang,et al.  Incremental discriminative-analysis of canonical correlations for action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Andrew Gilbert,et al.  Scale Invariant Action Recognition Using Compound Features Mined from Dense Spatio-temporal Corners , 2008, ECCV.

[5]  Yan Meng,et al.  Human activity recognition in video using a hierarchical probabilistic latent model , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[6]  Liang Wang,et al.  Recognizing Human Activities from Silhouettes: Motion Subspace and Factorial Discriminative Graphical Model , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Zicheng Liu,et al.  Expandable Data-Driven Graphical Modeling of Human Actions Based on Salient Postures , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[9]  Zhu Li,et al.  Real-time human action recognition by luminance field trajectory analysis , 2008, ACM Multimedia.

[10]  Cristian Sminchisescu,et al.  Generative modeling for continuous non-linearly embedded visual inference , 2004, ICML.

[11]  Mubarak Shah,et al.  Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Yannis Avrithis,et al.  Dense saliency-based spatiotemporal feature points for action recognition , 2009, CVPR.

[13]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Dani Lischinski,et al.  A Closed-Form Solution to Natural Image Matting , 2008 .

[15]  Mubarak Shah,et al.  Recognizing human actions , 2005, VSSN@MM.

[16]  Maneesh Agrawala,et al.  Interactive video cutout , 2005, SIGGRAPH 2005.

[17]  Larry S. Davis,et al.  Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[20]  Janusz Konrad,et al.  Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels , 2010, ICPR Contests.

[21]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[22]  Yang Song,et al.  Unsupervised Learning of Human Motion , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  David J. Kriegman,et al.  Recognition using class specific linear projection , 1997 .

[24]  Dit-Yan Yeung,et al.  Human action recognition using Local Spatio-Temporal Discriminant Embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  I. Jolliffe Principal Component Analysis , 2002 .

[28]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[29]  Yu Zhong,et al.  Action recognition in spatiotemporal volume , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[30]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[31]  Liang Wang,et al.  Learning and Matching of Dynamic Shape Manifolds for Human Action Recognition , 2007, IEEE Transactions on Image Processing.

[32]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Maneesh Agrawala,et al.  Inside interactive video cutout , 2005, SIGGRAPH '05.

[34]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[36]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[37]  S. Kollias,et al.  Dense saliency-based spatiotemporal feature points for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Actions Using Spatial-Temporal Words , 2006 .

[39]  Sebastian Nowozin,et al.  Discriminative Subsequence Mining for Action Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[40]  Liang-Tien Chia,et al.  Motion Context: A New Representation for Human Action Recognition , 2008, ECCV.

[41]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[42]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[43]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[44]  Tieniu Tan,et al.  Recent developments in human motion analysis , 2003, Pattern Recognit..

[45]  G. Stewart Matrix Algorithms, Volume II: Eigensystems , 2001 .

[46]  Ahmed M. Elgammal,et al.  Putting local features on a manifold , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Manuel Menezes de Oliveira Neto,et al.  Shared Sampling for Real‐Time Alpha Matting , 2010, Comput. Graph. Forum.

[48]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[49]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[50]  Sudeep Sarkar,et al.  Distribution-Based Dimensionality Reduction Applied to Articulated Motion Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.