Multi-task Sparse Learning with Beta Process Prior for Action Recognition

In this paper, we formulate human action recognition as a novel Multi-Task Sparse Learning(MTSL) framework which aims to construct a test sample with multiple features from as few bases as possible. Learning the sparse representation under each feature modality is considered as a single task in MTSL. Since the tasks are generated from multiple features associated with the same visual input, they are not independent but inter-related. We introduce a Beta process(BP) prior to the hierarchical MTSL model, which efficiently learns a compact dictionary and infers the sparse structure shared across all the tasks. The MTSL model enforces the robustness in coefficient estimation compared with performing each task independently. Besides, the sparseness is achieved via the Beta process formulation rather than the computationally expensive L1 norm penalty. In terms of non-informative gamma hyper-priors, the sparsity level is totally decided by the data. Finally, the learning problem is solved by Gibbs sampling inference which estimates the full posterior on the model parameters. Experimental results on the KTH and UCF sports datasets demonstrate the effectiveness of the proposed MTSL approach for action recognition.

[1]  Shuicheng Yan,et al.  Visual classification with multi-task joint sparse representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Tanaya Guha,et al.  Learning Sparse Representations for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Haibin Ling,et al.  Robust Visual Tracking and Vehicle Classification via Sparse Representation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Trevor Darrell,et al.  An efficient projection for l1, ∞ regularization , 2009, ICML '09.

[5]  Trevor Darrell,et al.  An efficient projection for l 1 , infinity regularization. , 2009, ICML 2009.

[6]  Rama Chellappa,et al.  Sparse dictionary-based representation and recognition of action attributes , 2011, 2011 International Conference on Computer Vision.

[7]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[8]  Narendra Ahuja,et al.  Robust Visual Tracking via Structured Multi-Task Sparse Learning , 2012, International Journal of Computer Vision.

[9]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Iasonas Kokkinos,et al.  Discovering discriminative action parts from mid-level video representations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[12]  Janusz Konrad,et al.  Action Recognition in Video by Sparse Representation on Covariance Manifolds of Silhouette Tunnels , 2010, ICPR Contests.

[13]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[14]  David B. Dunson,et al.  Multi-Task Compressive Sensing , 2007 .

[15]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[16]  Bruce A. Draper,et al.  Scalable action recognition with a subspace forest , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Ioannis A. Kakadiaris,et al.  Part-based motion descriptor image for human action recognition , 2012, Pattern Recognit..

[18]  Narendra Ahuja,et al.  Robust visual tracking via multi-task sparse learning , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Yi Yang,et al.  Action recognition by exploring data distribution and feature correlation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[22]  Guillermo Sapiro,et al.  Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations , 2009, NIPS.

[23]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[24]  Lior Wolf,et al.  Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  David B. Dunson,et al.  Multitask Compressive Sensing , 2009, IEEE Transactions on Signal Processing.

[27]  Qinghua Hu,et al.  A linear subspace learning approach via sparse coding , 2011, 2011 International Conference on Computer Vision.

[28]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.