Collaborative multimodal feature learning for RGB-D action recognition

Abstract The emergence of cost-effective depth sensors opens up a new dimension for RGB-D based human action recognition. In this paper, we propose a collaborative multimodal feature learning (CMFL) model for human action recognition from RGB-D sequences. Specifically, we propose a robust spatio-temporal pyramid feature (RSTPF) to capture dynamic local patterns around each human joint. The proposed CMFL model fuses multimodal data (skeleton, depth and RGB), and learns action classifiers using the fused features. The original low-level feature matrices are factorized to learn shared features and modality-specific features under a supervised fashion. The shared features describe the common structures among the three modalities while the modality-specific features capture intrinsic information of each modality. We formulate shared-specific features mining and action classifiers learning in a unified max-margin framework, and solve the formulation using an iterative optimization algorithm. Experimental results on four action datasets demonstrate the efficacy of the proposed method.

[1]  Alfred M. Bruckstein,et al.  RGBD-fusion: Real-time high precision depth recovery , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Limin Wang,et al.  Multi-view Super Vector for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jian-Huang Lai,et al.  Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Wanqing Li,et al.  Discriminative Key Pose Extraction Using Extended LC-KSVD for Action Recognition , 2014, 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[5]  Zhiming Luo,et al.  A novel recurrent hybrid network for feature fusion in action recognition , 2017, J. Vis. Commun. Image Represent..

[6]  Srinivas Akella,et al.  3D human action segmentation and recognition using pose kinetic energy , 2014, 2014 IEEE International Workshop on Advanced Robotics and its Social Impacts.

[7]  Bingbing Ni,et al.  Order-Preserving Sparse Coding for Sequence Classification , 2012, ECCV.

[8]  Zhengming Ding,et al.  Latent Tensor Transfer Learning for RGB-D Action Recognition , 2014, ACM Multimedia.

[9]  H. Zhang,et al.  Collaborative sparse representation leaning model for RGBD action recognition , 2017, J. Vis. Commun. Image Represent..

[10]  Stephen J. Maybank,et al.  Activity recognition using a supervised non-parametric hierarchical HMM , 2016, Neurocomputing.

[11]  Guodong Guo,et al.  Evaluating spatiotemporal interest point features for depth-based action recognition , 2014, Image Vis. Comput..

[12]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[13]  Juan José Pantrigo,et al.  Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition , 2018, Pattern Recognit..

[14]  Hong-Yuan Mark Liao,et al.  Depth and Skeleton Associated Action Recognition without Online Accessible RGB-D Cameras , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Xinyu Wu,et al.  The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences , 2017, Knowl. Based Syst..

[16]  Lihong Zheng,et al.  A Survey on Human Action Recognition Using Depth Sensors , 2015, 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[17]  Dapeng Tao,et al.  Skeleton embedded motion body partition for human action recognition using depth sequences , 2018, Signal Process..

[18]  Jun Wan,et al.  Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition , 2018, AAAI.

[19]  Mohamed Atri,et al.  Human action recognition using RGB data , 2016, 2016 11th International Design & Test Symposium (IDT).

[20]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[21]  Richard Bowden,et al.  Hollywood 3D: Recognizing Actions in 3D Natural Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Wei Liu,et al.  Discriminative Multi-instance Multitask Learning for 3D Action Recognition , 2017, IEEE Transactions on Multimedia.

[23]  Ling Shao,et al.  Structure-Preserving Binary Representations for RGB-D Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Yun Fu,et al.  Discriminative Relational Representation Learning for RGB-D Action Recognition , 2016, IEEE Transactions on Image Processing.

[25]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Junsong Yuan,et al.  Learning Actionlet Ensemble for 3D Human Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[28]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[29]  Arif Mahmood,et al.  HOPC: Histogram of Oriented Principal Components of 3D Pointclouds for Action Recognition , 2014, ECCV.

[30]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[31]  Yun Fu,et al.  Bilinear heterogeneous information machine for RGB-D action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Pichao Wang,et al.  Depth Pooling Based Large-Scale 3-D Action Recognition With Convolutional Neural Networks , 2018, IEEE Transactions on Multimedia.

[33]  Anuj Srivastava,et al.  Action Recognition Using Rate-Invariant Analysis of Skeletal Shape Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jing Zhang,et al.  Action Recognition From Depth Maps Using Deep Convolutional Neural Networks , 2016, IEEE Transactions on Human-Machine Systems.

[35]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Gang Wang,et al.  Multi-modal feature fusion for action recognition in RGB-D sequences , 2014, 2014 6th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[37]  Anton van den Hengel,et al.  Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition , 2015, Pattern Recognit..

[38]  Hong Liu,et al.  3D Action Recognition Using Multi-Temporal Depth Motion Maps and Fisher Vector , 2016, IJCAI.

[39]  Gang Wang,et al.  Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Javed Imran,et al.  Human action recognition using RGB-D sensor and deep convolutional neural networks , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[41]  Trevor Darrell,et al.  Learning with Side Information through Modality Hallucination , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[43]  Pichao Wang,et al.  Joint Distance Maps Based Action Recognition With Convolutional Neural Networks , 2017, IEEE Signal Processing Letters.

[44]  Jinwen Ma,et al.  DMMs-Based Multiple Features Fusion for Human Action Recognition , 2015, Int. J. Multim. Data Eng. Manag..

[45]  Xiaodong Yang,et al.  Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.