Learning to Share Latent Tasks for Action Recognition

Sharing knowledge for multiple related machine learning tasks is an effective strategy to improve the generalization performance. In this paper, we investigate knowledge sharing across categories for action recognition in videos. The motivation is that many action categories are related, where common motion pattern are shared among them (e.g. diving and high jump share the jump motion). We propose a new multi-task learning method to learn latent tasks shared across categories, and reconstruct a classifier for each category from these latent tasks. Compared to previous methods, our approach has two advantages: (1) The learned latent tasks correspond to basic motion patterns instead of full actions, thus enhancing discrimination power of the classifiers. (2) Categories are selected to share information with a sparsity regularizer, avoiding falsely forcing all categories to share knowledge. Experimental results on multiple public data sets show that the proposed approach can effectively transfer knowledge between different action categories to improve the performance of conventional single task learning methods.

[1]  Yang Wang,et al.  Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[3]  乔宇 Motionlets: Mid-Level 3D Parts for Human Motion Recognition , 2013 .

[4]  Trevor Darrell,et al.  Sparselet Models for Efficient Multiclass Object Detection , 2012, ECCV.

[5]  Matthijs Douze,et al.  Large-scale image classification with trace-norm regularization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[7]  Bingbing Ni,et al.  Recognizing human group activities with localized causalities , 2009, CVPR 2009.

[8]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[9]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[10]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[12]  Zicheng Liu,et al.  Cross-dataset action detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Yang Wang,et al.  Discriminative Latent Models for Recognizing Contextual Group Activities , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[15]  Deva Ramanan,et al.  Steerable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[17]  Joshua B. Tenenbaum,et al.  Learning to share visual appearance for multiclass object detection , 2011, CVPR 2011.

[18]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[19]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[20]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Mark Everingham,et al.  Shared parts for deformable part-based models , 2011, CVPR 2011.

[22]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Ming-Wei Chang,et al.  Learning shared body plans , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Fei-Fei Li,et al.  Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Tal Hassner,et al.  Motion Interchange Patterns for Action Recognition in Unconstrained Videos , 2012, ECCV.

[26]  Charless C. Fowlkes,et al.  Bilinear classifiers for visual recognition , 2009, NIPS.

[27]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.

[28]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[30]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[32]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.