Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels

A novel framework for action recognition in video using empirical covariance matrices of bags of low-dimensional feature vectors is developed. The feature vectors are extracted from segments of silhouette tunnels of moving objects and coarsely capture their shapes. The matrix logarithm is used to map the segment covariance matrices, which live in a nonlinear Riemannian manifold, to the vector space of symmetric matrices. A recently developed sparse linear representation framework for dictionary-based classification is then applied to the log-covariance matrices. The log-covariance matrix of a query segment is approximated by a sparse linear combination of the log-covariance matrices of training segments and the sparse coefficients are used to determine the action label of the query segment. This approach is tested on the Weizmann and the UT-Tower human action datasets. The new approach attains a segment-level classification rate of 96.74% for the Weizmann dataset and 96.15% for the UT-Tower dataset. Additionally, the proposed method is computationally and memory efficient and easy to implement.

[1]  N. Ayache,et al.  Log‐Euclidean metrics for fast and simple calculus on diffusion tensors , 2006, Magnetic resonance in medicine.

[2]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Janusz Konrad,et al.  Action Recognition in Video by Covariance Matching of Silhouette Tunnels , 2009, 2009 XXII Brazilian Symposium on Computer Graphics and Image Processing.

[4]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[5]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[6]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[7]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[8]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[10]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[11]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Peyman Milanfar,et al.  Action Recognition from One Example , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.