Human Action Recognition under Log-Euclidean Riemannian Metric

This paper presents a new action recognition approach based on local spatio-temporal features. The main contributions of our approach are twofold. First, a new local spatio-temporal feature is proposed to represent the cuboids detected in video sequences. Specifically, the descriptor utilizes the covariance matrix to capture the self-correlation information of the low-level features within each cuboid. Since covariance matrices do not lie on Euclidean space, the Log-Euclidean Riemannian metric is used for distance measure between covariance matrices. Second, the Earth Mover’s Distance (EMD) is used for matching any pair of video sequences. In contrast to the widely used Euclidean distance, EMD achieves more robust performances in matching histograms/distributions with different sizes. Experimental results on two datasets demonstrate the effectiveness of the proposed approach.

[1]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[4]  Roberto Cipolla,et al.  Extracting Spatiotemporal Interest Points using Global Information , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[6]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[7]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[8]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Florent Perronnin,et al.  Universal and Adapted Vocabularies for Generic Visual Categorization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Andrew Zisserman,et al.  An Affine Invariant Salient Region Detector , 2004, ECCV.

[11]  Nicholas Ayache,et al.  Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices , 2007, SIAM J. Matrix Anal. Appl..

[12]  Mubarak Shah,et al.  Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  R. Sukthankar,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[15]  Nicolas Pérez de la Blanca,et al.  Human Motion Characterization Using Spatio-temporal Features , 2007, IbPRIA.

[16]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[17]  Jiří Matas,et al.  Computer Vision - ECCV 2004 , 2004, Lecture Notes in Computer Science.

[18]  Dit-Yan Yeung,et al.  Human action recognition using Local Spatio-Temporal Discriminant Embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Liang Wang,et al.  Recognizing Human Activities from Silhouettes: Motion Subspace and Factorial Discriminative Graphical Model , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  B. Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[22]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[23]  Xiaoqin Zhang,et al.  Visual tracking via incremental Log-Euclidean Riemannian subspace learning , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.