Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition

Regularized multi-task learning (MTL) algorithms have been exploited in the field of pattern recognition and computer vision gradually, which can fully excavate the relationships of different related tasks. Therefore, many dramatically favorable approaches based on regularized MTL have been proposed. In the past decades, although the promising results about human action recognition have been achieved, most of existing action recognition algorithms focus on action descriptors, single/multi-view and multi-modality action recognition, and few works are related with MTL, especial of lacking the systematic evaluation of existing MTL algorithms for human action recognition. Thus, in the paper, seven popular regularized MTL algorithms in which different actions are considered as different tasks, are systematically exploited on two public multi-view action datasets. In detail, dense trajectory features are firstly extracted for each view, and then the shared codebook are constructed for all views by k-means, and then each video is coded by the shared codebook. Moreover, according to different regularized MTL algorithms, all actions or part of actions are considered as related, and then these actions are set to different tasks in MTL. Further, the effectiveness of different number of training samples from different action views is also evaluated for MTL. Large scale experimental results show that: 1) Regularized MTL is very useful for action recognition which can dig the latent relationship among different actions; 2) Not of all human actions are related, if irrelative actions are put together in MTL, its performance will fall; 3) With the increase of the training samples from different views, the relationships about different actions can be fully exploited, and it promotes the accuracy improvement of action recognition.

[1]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Tetsuya Takiguchi,et al.  3D human posture estimation using the HOG features from monocular image , 2008, 2008 19th International Conference on Pattern Recognition.

[3]  Yongdong Zhang,et al.  Deep Fusion of Multiple Semantic Cues for Complex Event Recognition , 2016, IEEE Transactions on Image Processing.

[4]  Deyu Wang,et al.  A Fast 3D Retrieval Algorithm via Class-Statistic and Pair-Constraint Model , 2016, ACM Multimedia.

[5]  Anni Cai,et al.  Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset , 2012, Multimedia Tools and Applications.

[6]  Ying Wu,et al.  Cross-View Action Modeling, Learning, and Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[8]  Trevor Darrell,et al.  Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yuhong Guo,et al.  Convex Subspace Representation Learning from Multi-View Data , 2013, AAAI.

[10]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[11]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[12]  Mohan S. Kankanhalli,et al.  Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yuting Su,et al.  Multiple/Single-View Human Action Recognition via Part-Induced Multitask Structural Learning , 2015, IEEE Transactions on Cybernetics.

[14]  NieWei-Zhi,et al.  Cross-view action recognition by cross-domain learning , 2016 .

[15]  Theo Gevers,et al.  Evaluation of Color Spatio-Temporal Interest Points for Human Action Recognition , 2014, IEEE Transactions on Image Processing.

[16]  Weizhi Nie,et al.  Evaluation of local spatial-temporal features for cross-view action recognition , 2016, Neurocomputing.

[17]  Tongwei Ren,et al.  Salient object detection for RGB-D image via saliency evolution , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[18]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[19]  Li Ma,et al.  Coupled hidden conditional random fields for RGB-D human action recognition , 2015, Signal Process..

[20]  Jakub Konecný,et al.  One-shot-learning gesture recognition using HOG-HOF features , 2014, J. Mach. Learn. Res..

[21]  Yang Liu,et al.  Depth-aware salient object detection using anisotropic center-surround difference , 2015, Signal Process. Image Commun..

[22]  Meng Wang,et al.  A Deep Structured Model with Radius–Margin Bound for 3D Human Activity Recognition , 2015, International Journal of Computer Vision.

[23]  KonečnýJakub,et al.  One-shot-learning gesture recognition using HOG-HOF features , 2014 .

[24]  Yanbing Xue,et al.  Human Action Recognition Via Multi-modality Information , 2014 .

[25]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[26]  Xindong Wu,et al.  3-D Object Retrieval With Hausdorff Distance Learning , 2014, IEEE Transactions on Industrial Electronics.

[27]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[28]  Yongdong Zhang,et al.  Coarse-to-Fine Description for Fine-Grained Visual Categorization , 2016, IEEE Transactions on Image Processing.

[29]  Y. Zhang,et al.  Multi-dimensional human action recognition model based on image set and group sparisty , 2016, Neurocomputing.

[30]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Yanbing Xue,et al.  Human action recognition on depth dataset , 2015, Neural Computing and Applications.

[32]  Qiang Zhou,et al.  Learning to Share Latent Tasks for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Ajmal Mian,et al.  3D Action Recognition from Novel Viewpoints , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Anil K. Jain,et al.  A Network of Dynamic Probabilistic Models for Human Interaction Analysis , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[37]  Rui Li,et al.  Simultaneous Learning of Nonlinear Manifold and Dynamical Models for High-dimensional Time Series , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[38]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[39]  Yanbing Xue,et al.  Human Action Recognition Using Pyramid Histograms of Oriented Gradients and Collaborative Multi-task Learning , 2014, KSII Trans. Internet Inf. Syst..

[40]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[41]  Yasushi Makihara,et al.  Inverse Dynamics for Action Recognition , 2013, IEEE Transactions on Cybernetics.

[42]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[44]  Yuting Su,et al.  Graph-based characteristic view set extraction and matching for 3D model retrieval , 2015, Inf. Sci..

[45]  Shiliang Sun,et al.  A survey of multi-view machine learning , 2013, Neural Computing and Applications.

[46]  Yue Gao,et al.  Multi-Modal Clique-Graph Matching for View-Based 3D Model Retrieval , 2016, IEEE Transactions on Image Processing.

[47]  Wenhui Li,et al.  Cross-view action recognition by cross-domain learning , 2016, Image Vis. Comput..

[48]  Meng Wang,et al.  Multimedia answering: enriching text QA with media information , 2011, SIGIR.

[49]  Guolong Chen,et al.  Human action recognition via multi-task learning base on spatial-temporal feature , 2015, Inf. Sci..

[50]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Weizhi Nie,et al.  Clique-graph matching by preserving global & local structure , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  GaoYue,et al.  Multi-Modal Clique-Graph Matching for View-Based 3D Model Retrieval , 2016 .

[53]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[54]  Ran Ju,et al.  Interactive RGB-D Image Segmentation Using Hierarchical Graph Cut and Geodesic Distance , 2015, PCM.

[55]  Zan Gao,et al.  Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition , 2015, Signal Process..

[56]  Tae-Kyun Kim,et al.  Active Random Forests: An Application to Autonomous Unfolding of Clothes , 2014, ECCV.

[57]  Qian Wang,et al.  Reconstruction and Application of Protein–Protein Interaction Network , 2016, International journal of molecular sciences.

[58]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[59]  Jing Liu,et al.  Object proposal on RGB-D images via elastic edge boxes , 2017, Neurocomputing.

[60]  Mohan S. Kankanhalli,et al.  Benchmarking a Multimodal and Multiview and Interactive Dataset for Human Action Recognition , 2017, IEEE Transactions on Cybernetics.

[61]  H. Zhang,et al.  Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition , 2015, Neurocomputing.