Visual attributes based sparse multitask action recognition

For action recognition, traditional multitask learning can share low-level features among actions effectively, but it neglects high-level semantic relationships between latent visual attributes and actions. Some action classes might be related, where latent visual attributes across categories are shared among them. In this paper, we improve multitask learning model using attribute-actions relationship for action datasets with sparse and incomplete labels. Moreover, the amount of semantic information of visual attributes and action class labels are different, so we carry out attribute task learning and action task learning separately for improving generalization performance. Specifically, for two latent variables, i.e. visual attributes and model parameters, we formulate the joint optimization objective function regularized by low rank and sparsity. To deal with this non-convex optimization, we transform this non-convex objective function into the convex formulation by an auxiliary variable. Experimental results on two datasets show that the proposed approach can learn latent knowledge effectively to enhance discrimination power and is competitive to other baseline methods.

[1]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[2]  Thomas Serre,et al.  Categorization by Learning and Combining Object Parts , 2001, NIPS.

[3]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[4]  Yang Wang,et al.  A Discriminative Latent Model of Object Classes and Attributes , 2010, ECCV.

[5]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Yang Yang,et al.  Learning semantic visual vocabularies using diffusion distance , 2009, CVPR.

[8]  Ling Shao,et al.  Human action segmentation and recognition via motion and shape analysis , 2012, Pattern Recognit. Lett..

[9]  Ling Shao,et al.  Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition , 2013, Pattern Recognit..

[10]  Kazuyuki Aihara,et al.  Classifying matrices with a spectral regularization , 2007, ICML '07.

[11]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Alexander J. Smola,et al.  Improving maximum margin matrix factorization , 2008, Machine Learning.

[13]  Stefano Soatto,et al.  Tracklet Descriptors for Action Modeling and Video Analysis , 2010, ECCV.

[14]  Weiyu Xu,et al.  Necessary and sufficient conditions for success of the nuclear norm heuristic for rank minimization , 2008, 2008 47th IEEE Conference on Decision and Control.

[15]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[16]  Xiang Ren,et al.  Linearized Alternating Direction Method with Adaptive Penalty and Warm Starts for Fast Solving Transform Invariant Low-Rank Textures , 2012, International Journal of Computer Vision.

[17]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[18]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[19]  Rogério Schmidt Feris,et al.  Attribute-based people search in surveillance environments , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[20]  Stephen P. Boyd,et al.  A rank minimization heuristic with application to minimum order system approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[21]  Jake K. Aggarwal,et al.  Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Kristen Grauman,et al.  Sharing features between objects and their attributes , 2011, CVPR 2011.

[23]  James E. Fowler,et al.  Compressive-Projection Principal Component Analysis , 2009, IEEE Transactions on Image Processing.

[24]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[25]  Larry S. Davis,et al.  Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.