Multi-task human action recognition via exploring super-category

There is indeed a relationship among various action categories, with which many correlated action categories can be clustered into a same group, named super-category. Knowledge sharing within super-category is an effective strategy to achieve good generalization performance. In this paper, we propose a novel human action recognition method based on multi-task learning framework with super-category. We employ Fisher vector as the action representation by concatenating the gradients of log likelihood with respect to mean vector and covariance parameters of Gaussion Mixture Model. Considering the occupancy probability of each Gaussian component is different, we naturally discover the relationship among different action categories by evaluating the importance of each Gaussian component in classifying each category. For these categories, the more related to the same Gaussian component, the more possible belonging to the same super-category, and vice versa. By applying the explored super-category information as a prior, feature sharing within super-category and feature competition between super-categories are simultaneously encouraged in multi-task learning framework. Experimental results on large and realistic datasets HMDB51 and UCF50 show that the proposed method achieves higher accuracy with less dimensions of features over several state-of-the-art approaches. HighlightsAn novel action recognition approach is proposed based on MTL framework with super-category.Super-category is explored by measuring the similarity among action categories.Feature sharing and competition are encouraged simultaneously with super-category.

[1]  Yong Luo,et al.  Decomposition-Based Transfer Distance Metric Learning for Image Classification , 2014, IEEE Transactions on Image Processing.

[2]  Yong Luo,et al.  Multiview matrix completion for multilabel image classification. , 2015, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[3]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[4]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[5]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  Xi Chen,et al.  Smoothing Proximal Gradient Method for General Structured Sparse Learning , 2011, UAI.

[8]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[9]  Chunheng Wang,et al.  Cross-View Action Recognition via a Continuous Virtual Path , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[11]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Kristen Grauman,et al.  Decorrelating Semantic Visual Attributes by Resisting the Urge to Share , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[14]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[15]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[16]  Ruonan Li,et al.  Discriminative virtual views for cross-view action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[18]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[19]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[20]  Jianfei Cai,et al.  Compact Representation for Image Classification: To Choose or to Compress? , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Ying Wu,et al.  Cross-View Action Modeling, Learning, and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Baoxin Li,et al.  Predicting Multiple Attributes via Relative Multi-task Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Limin Wang,et al.  Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice , 2014, Comput. Vis. Image Underst..

[24]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Dacheng Tao,et al.  Multi-Task Model and Feature Joint Learning , 2015, IJCAI.

[26]  Mubarak Shah,et al.  DaMN - Discriminative and Mutually Nearest: Exploiting Pairwise Category Proximity for Video Action Recognition , 2014, ECCV.

[27]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[29]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[30]  Yu Qiao,et al.  Action Recognition with Stacked Fisher Vectors , 2014, ECCV.

[31]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[32]  Jenq-Neng Hwang,et al.  A Review on Video-Based Human Activity Recognition , 2013, Comput..

[33]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[34]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[35]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.

[36]  Yong Luo,et al.  Multiview Vector-Valued Manifold Regularization for Multilabel Image Classification , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[38]  Yong Luo,et al.  Manifold Regularized Multitask Learning for Semi-Supervised Multilabel Image Classification , 2013, IEEE Transactions on Image Processing.

[39]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[40]  Xiaodong Yang,et al.  Action Recognition Using Super Sparse Coding Vector with Spatio-temporal Awareness , 2014, ECCV.

[41]  Limin Wang,et al.  Multi-view Super Vector for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Qiang Zhou,et al.  Learning to Share Latent Tasks for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.