论文信息 - Spectral learning of latent semantics for action recognition

Spectral learning of latent semantics for action recognition

This paper proposes novel spectral methods for learning latent semantics (i.e. high-level features) from a large vocabulary of abundant mid-level features (i.e. visual keywords), which can help to bridge the semantic gap in the challenging task of action recognition. To discover the manifold structure hidden among mid-level features, we develop spectral embedding approaches based on graphs and hypergraphs, without the need to tune any parameter for graph construction which is a key step of manifold learning. In particular, the traditional graphs are constructed by linear reconstruction with sparse coding. In the new embedding space, we learn high-level latent semantics automatically from abundant mid-level features through spectral clustering. The learnt latent semantics can be readily used for action recognition with SVM by defining a histogram intersection kernel. Different from the traditional latent semantic analysis based on topic models, our two spectral methods for semantic learning can discover the manifold structure hidden among mid-level features, which results in compact but discriminative high-level features. The experimental results on two standard action datasets have shown the superior performance of our spectral methods.

Zhiwu Lu | Yuxin Peng | Horace Ho-Shing Ip

[1] Shaogang Gong,et al. Recognising action as clouds of space-time interest points , 2009, CVPR.

[2] Stephen Lin,et al. Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[4] D. Donoho. For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[5] Mubarak Shah,et al. Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6] Yang Yang,et al. Learning semantic visual vocabularies using diffusion distance , 2009, CVPR.

[7] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[9] Joshua B. Tenenbaum,et al. The Isomap Algorithm and Topological Stability , 2002, Science.

[10] Quanquan Gu,et al. Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[11] Thomas Hofmann,et al. Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[12] Zicheng Liu,et al. Action detection using multiple spatial-temporal interest point features , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[13] Rama Chellappa,et al. View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[14] Ann B. Lee,et al. Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[16] Yang Wang,et al. Human Action Recognition by Semilatent Topic Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Jiebo Luo,et al. Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Michael G. Strintzis,et al. Statistical Motion Information Extraction and Representation for Semantic Video Analysis , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[19] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20] Serge J. Belongie,et al. Higher order learning with graphs , 2006, ICML.

[21] Serge J. Belongie,et al. Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[22] James J. Little,et al. Tracking and recognizing actions of multiple hockey players using the boosted particle filter , 2009, Image Vis. Comput..

[23] Allen Y. Yang,et al. Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Jieping Ye,et al. Hypergraph spectral learning for multi-label classification , 2008, KDD.

[25] Alberto Del Bimbo,et al. Effective Codebooks for human action categorization , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[26] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories , 2006 .

[27] Adriana Kovashka,et al. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28] Rama Chellappa,et al. Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[29] Bernhard Schölkopf,et al. Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.