论文信息 - Jointly Learning Heterogeneous Features for RGB-D Activity Recognition

Jointly Learning Heterogeneous Features for RGB-D Activity Recognition

In this paper, we focus on heterogeneous features learning for RGB-D activity recognition. We find that features from different channels (RGB, depth) could share some similar hidden structures, and then propose a joint learning model to simultaneously explore the shared and feature-specific components as an instance of heterogeneous multi-task learning. The proposed model formed in a unified framework is capable of: 1) jointly mining a set of subspaces with the same dimensionality to exploit latent shared features across different feature channels, 2) meanwhile, quantifying the shared and feature-specific components of features in the subspaces, and 3) transferring feature-specific intermediate transforms (i-transforms) for learning fusion of heterogeneous features across datasets. To efficiently train the joint model, a three-step iterative optimization algorithm is proposed, followed by a simple inference model. Extensive experimental results on four activity datasets have demonstrated the efficacy of the proposed method. A new RGB-D activity dataset focusing on human-object interaction is further contributed, which presents more challenges for RGB-D activity benchmarking.

[1] Tong Zhang,et al. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[2] Meinard Müller,et al. Motion templates for automatic classification and retrieval of motion capture data , 2006, SCA '06.

[3] Ramakant Nevatia,et al. Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost , 2006, ECCV.

[4] Shimon Ullman,et al. Uncovering shared structures in multiclass classification , 2007, ICML '07.

[5] Christopher Joseph Pal,et al. Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6] Eric P. Xing,et al. Heterogeneous multitask learning with joint sparsity constraints , 2009, NIPS.

[7] Jiebo Luo,et al. Heterogeneous feature machines for visual recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8] Wanqing Li,et al. Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[9] Dit-Yan Yeung,et al. A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[10] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.

[11] Bart Selman,et al. Human Activity Detection from RGBD Images , 2011, Plan, Activity, and Intent Recognition.

[12] Dit-Yan Yeung,et al. Multi-Task Learning in Heterogeneous Feature Spaces , 2011, AAAI.

[13] Xiaodong Yang,et al. EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[14] Fei-Fei Li,et al. Action Recognition with Exemplar Based 2.5D Graph Matching , 2012, ECCV.

[15] Fei-Fei Li,et al. Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Dieter Fox,et al. Fine-grained kitchen activity recognition using RGB-D , 2012, UbiComp.

[17] Changyin Sun,et al. Supervised class-specific dictionary learning for sparse modeling in action recognition , 2012, Pattern Recognit..

[18] Lu Yang,et al. Combing RGB and Depth Map Features for human activity recognition , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[19] Jake K. Aggarwal,et al. View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[20] Lawrence Carin,et al. Cross-Domain Multitask Learning with Latent Probit Models , 2012, ICML.

[21] Yale Song,et al. Multi-view latent variable discriminative models for action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Ying Wu,et al. Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[23] Tanaya Guha,et al. Learning Sparse Representations for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Ruzena Bajcsy,et al. Sequence of the Most Informative Joints (SMIJ): A new representation for human skeletal action recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[25] Yi Yang,et al. Action recognition by exploring data distribution and feature correlation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Bingbing Ni,et al. Order-Preserving Sparse Coding for Sequence Classification , 2012, ECCV.

[27] Ling Shao,et al. Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[28] Zicheng Liu,et al. HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Guodong Guo,et al. Fusing Spatiotemporal Features and Joints for 3D Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[30] Qiang Zhou,et al. Learning to Share Latent Tasks for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[31] Cristian Sminchisescu,et al. The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[32] Alexandros André Chaaraoui,et al. Fusion of Skeletal and Silhouette-Based Features for Human Action Recognition with RGB-D Devices , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[33] Jieping Ye,et al. A Convex Formulation for Learning a Shared Predictive Structure from Multiple Tasks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Hairong Qi,et al. Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps , 2013, 2013 IEEE International Conference on Computer Vision.

[35] Jake K. Aggarwal,et al. Spatio-temporal Depth Cuboid Similarity Feature for Activity Recognition Using Depth Camera , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36] Nanning Zheng,et al. Modeling 4D Human-Object Interactions for Event and Object Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[37] Marwan Torki,et al. Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[38] Wotao Yin,et al. A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[39] Hema Swetha Koppula,et al. Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[40] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[41] Rama Chellappa,et al. Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42] Gang Wang,et al. Multi-modal feature fusion for action recognition in RGB-D sequences , 2014, 2014 6th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[43] Subramanian Ramanathan,et al. Multitask Linear Discriminant Analysis for View Invariant Action Recognition , 2014, IEEE Transactions on Image Processing.

[44] Junsong Yuan,et al. Learning Actionlet Ensemble for 3D Human Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45] Guodong Guo,et al. Evaluating spatiotemporal interest point features for depth-based action recognition , 2014, Image Vis. Comput..

[46] Ivor W. Tsang,et al. Learning With Augmented Features for Supervised and Semi-Supervised Heterogeneous Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47] Limin Wang,et al. Multi-view Super Vector for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48] Cewu Lu,et al. Range-Sample Depth Feature for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Xiaodong Yang,et al. Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50] Juan Carlos Niebles,et al. Discriminative Hierarchical Modeling of Spatio-temporally Composable Human Activities , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51] Yong Du,et al. Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Jian-Huang Lai,et al. Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53] Jiwen Lu,et al. MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54] Yun Fu,et al. Bilinear heterogeneous information machine for RGB-D action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Ling Shao,et al. Structure-Preserving Binary Representations for RGB-D Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56] Jian-Huang Lai,et al. Exemplar-Based Recognition of Human–Object Interactions , 2016, IEEE Transactions on Circuits and Systems for Video Technology.