Statistical adaptive metric learning in visual action feature set recognition

Great variances in visual features often present significant challenges in human action recognitions. To address this common problem, this paper proposes a statistical adaptive metric learning (SAML) method by exploring various selections and combinations of multiple statistics in a unified metric learning framework. Most statistics have certain advantages in specific controlled environments, and systematic selections and combinations can adapt them to more realistic "in the wild" scenarios. In the proposed method, multiple statistics, include means, covariance matrices and Gaussian distributions, are explicitly mapped or generated in the Riemannian manifolds. Typically, d-dimensional mean vectors in Rd are mapped to a Rd×d space of symmetric positive definite (SPD) matrices S y m d + . Subsequently, by embedding the heterogeneous manifolds in their tangent Hilbert space, subspace combination with minimal deviation is selected from multiple statistics. Then Mahalanobis metrics are introduced to map them back into the Euclidean space. Unified optimizations are finally performed based on the Euclidean distances. In the proposed method, subspaces with smaller deviations are selected before metric learning. Therefore, by exploring different metric combinations, the final learning is more representative and effective than exhaustively learning from all the hybrid metrics. Experimental evaluations are conducted on human action recognitions in both static and dynamic scenarios. Promising results demonstrate that the proposed method performs effectively for human action recognitions in the wild. Display Omitted A statistical adaptive metric learning (SAML) is proposed to classify action features.SAML explores multiple statistic combinations for feature sets in different scales.Discriminative statistic subspace is learned by a unified metric learning framework.High competitive performances are achieved by SAML on five benchmark databases.

[1]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Inderjit S. Dhillon,et al.  Matrix Nearness Problems with Bregman Divergences , 2007, SIAM J. Matrix Anal. Appl..

[3]  Ling Shao,et al.  Structure-Preserving Binary Representations for RGB-D Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[5]  Ling Shao,et al.  Spatio-Temporal Laplacian Pyramid Coding for Action Recognition , 2014, IEEE Transactions on Cybernetics.

[6]  Inderjit S. Dhillon,et al.  Learning low-rank kernel matrices , 2006, ICML.

[7]  Ling Shao,et al.  Kernelized Multiview Projection for Robust Action Recognition , 2016, International Journal of Computer Vision.

[8]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[9]  Ling Shao,et al.  Learning Spatio-Temporal Representations for Action Recognition: A Genetic Programming Approach , 2016, IEEE Transactions on Cybernetics.

[10]  Hong Man,et al.  Learning spatio-temporal dependencies for action recognition , 2013, 2013 IEEE International Conference on Image Processing.

[11]  Shiguang Shan,et al.  Learning Euclidean-to-Riemannian Metric for Point-to-Set Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Josef Kittler,et al.  Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Shiguang Shan,et al.  Hybrid Euclidean-and-Riemannian Metric Learning for Image Set Classification , 2014, ACCV.

[16]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[17]  Ling Shao,et al.  Action recognition by spatio-temporal oriented energies , 2014, Inf. Sci..

[18]  Enrico Grosso,et al.  Identity Management in Face Recognition Systems , 2008, BIOID.

[19]  Wen Gao,et al.  Manifold–Manifold Distance and its Application to Face Recognition With Image Sets , 2012, IEEE Transactions on Image Processing.

[20]  Shiguang Shan,et al.  Coupling Alignments with Recognition for Still-to-Video Face Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Ajmal S. Mian,et al.  Sparse approximated nearest points for image set classification , 2011, CVPR 2011.

[22]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Zhou-Jing Wang,et al.  Logarithmic least squares prioritization and completion methods for interval fuzzy preference relations based on geometric transitivity , 2014, Inf. Sci..

[24]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[25]  Osamu Yamaguchi,et al.  Face Recognition Using Multi-viewpoint Patterns for Robot Vision , 2003, ISRR.

[26]  Hakan Cevikalp,et al.  Face recognition based on image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Daniel D. Lee,et al.  Grassmann discriminant analysis: a unifying view on subspace-based learning , 2008, ICML '08.

[28]  Gang Wang,et al.  Image Set Classification Using Holistic Multiple Order Statistics Features and Localized Multi-kernel Metric Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[29]  Ruiping Wang,et al.  Manifold Discriminant Analysis , 2009, CVPR.

[30]  Ken-ichi Maeda,et al.  Face recognition using temporal image sequence , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[31]  Trevor Darrell,et al.  Face recognition with image sets using manifold density divergence , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Lei Zhang,et al.  Face recognition based on regularized nearest points between image sets , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[33]  Dong Xu,et al.  Action recognition using context and appearance distribution features , 2011, CVPR 2011.

[34]  Shiguang Shan,et al.  Face recognition on large-scale video in the wild with hybrid Euclidean-and-Riemannian metric learning , 2015, Pattern Recognit..

[35]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[36]  David Windridge,et al.  An evaluation of bags-of-words and spatio-temporal shapes for action recognition , 2011, 2011 IEEE Workshop on Applications of Computer Vision (WACV).

[37]  David Zhang,et al.  From Point to Set: Extend the Learning of Distance Metrics , 2013, 2013 IEEE International Conference on Computer Vision.