RGB-D action recognition based on discriminative common structure learning model

Abstract. The emergence of low-cost depth cameras creates potential for RGB-D based human action recognition. However, most of the existing RGB-D based approaches simply concatenate original heterogeneous features without discovering the latent relations among different modalities. We propose a discriminative common structure learning (DCSL) model for human action recognition from RGB-D sequences. Specifically, we extract deep learning-based features and hand-crafted features from multimodal data (skeleton, depth, and RGB). In particular, we propose a deep architecture based on 3-D convolutional neural network to automatically extract deep spatiotemporal features from raw sequences. The proposed DCSL model utilizes a generalized version of collective matrix factorization to learn shared features among different modalities. To perform supervised learning and preserve intermodal similarity, we formulate a graph regularization term by considering both label information and similar geometric structure of multimodal data, which intends to improve the discriminative power of shared features. Moreover, we solve the objective function using an iterative optimization algorithm. Then, an improved collaborative representation classifier is employed to perform computationally efficient action recognition. Experimental results on four action datasets demonstrate the superior performance of the proposed method.

[1]  Wei-Shi Zheng,et al.  Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Michael Ying Yang,et al.  Change Detection between Multimodal Remote Sensing Data Using Siamese CNN , 2018, ArXiv.

[3]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Gang Hua,et al.  Supervised Matrix Factorization for Cross-Modality Hashing , 2016, IJCAI.

[5]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[6]  Ivor W. Tsang,et al.  Learning With Augmented Features for Supervised and Semi-Supervised Heterogeneous Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Wanqing Li,et al.  Discriminative Key Pose Extraction Using Extended LC-KSVD for Action Recognition , 2014, 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[8]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[9]  Limin Wang,et al.  Multi-view Super Vector for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[11]  Bingbing Ni,et al.  Order-Preserving Sparse Coding for Sequence Classification , 2012, ECCV.

[12]  H. Zhang,et al.  Collaborative sparse representation leaning model for RGBD action recognition , 2017, J. Vis. Commun. Image Represent..

[13]  Mohamed Atri,et al.  Human action recognition using RGB data , 2016, 2016 11th International Design & Test Symposium (IDT).

[14]  Xinyu Wu,et al.  The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences , 2017, Knowl. Based Syst..

[15]  Dapeng Tao,et al.  Skeleton embedded motion body partition for human action recognition using depth sequences , 2018, Signal Process..

[16]  Jeff A. Bilmes,et al.  Deep Canonical Correlation Analysis , 2013, ICML.

[17]  Juan José Pantrigo,et al.  Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition , 2018, Pattern Recognit..

[18]  Stephen J. Maybank,et al.  Activity recognition using a supervised non-parametric hierarchical HMM , 2016, Neurocomputing.

[19]  Guodong Guo,et al.  Evaluating spatiotemporal interest point features for depth-based action recognition , 2014, Image Vis. Comput..

[20]  Linqin Cai,et al.  Robust human action recognition based on depth motion maps and improved convolutional neural network , 2018, J. Electronic Imaging.

[21]  Qiang Zhou,et al.  Learning to Share Latent Tasks for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Sang-Gu Lee,et al.  Simultaneous solutions of Sylvester equations and idempotent matrices separating the joint spectrum , 2011 .

[23]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[24]  Pichao Wang,et al.  Joint Distance Maps Based Action Recognition With Convolutional Neural Networks , 2017, IEEE Signal Processing Letters.

[25]  Ling Shao,et al.  Structure-Preserving Binary Representations for RGB-D Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Yun Fu,et al.  Discriminative Relational Representation Learning for RGB-D Action Recognition , 2016, IEEE Transactions on Image Processing.

[27]  Jinwen Ma,et al.  DMMs-Based Multiple Features Fusion for Human Action Recognition , 2015, Int. J. Multim. Data Eng. Manag..

[28]  Gang Wang,et al.  Global Context-Aware Attention LSTM Networks for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Yuting Su,et al.  Multiple/Single-View Human Action Recognition via Part-Induced Multitask Structural Learning , 2015, IEEE Transactions on Cybernetics.

[31]  Gang Wang,et al.  Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Xiaodong Yang,et al.  Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Junsong Yuan,et al.  Learning Actionlet Ensemble for 3D Human Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Fangyu Hu,et al.  Action Recognition Based on Features Fusion and 3D Convolutional Neural Networks , 2016, 2016 9th International Symposium on Computational Intelligence and Design (ISCID).

[35]  Hongsheng Yin,et al.  Multisource learning for skeleton-based action recognition using deep LSTM and CNN , 2018, J. Electronic Imaging.

[36]  Frédéric Jurie,et al.  TS-NET: Combining Modality Specific and Common Features for Multimodal Patch Matching , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[37]  Guodong Guo,et al.  Fusing Spatiotemporal Features and Joints for 3D Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[38]  Qian Du,et al.  Hyperspectral Image Classification Using Weighted Joint Collaborative Representation , 2015, IEEE Geoscience and Remote Sensing Letters.

[39]  Javed Imran,et al.  Human action recognition using RGB-D sensor and deep convolutional neural networks , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[40]  Saeid Nahavandi,et al.  Human action recognition based on Pyramid Histogram of Oriented Gradients , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[41]  Pichao Wang,et al.  Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks , 2016, ACM Multimedia.

[42]  Chalavadi Krishna Mohan,et al.  Human action recognition in RGB-D videos using motion sequence information and deep learning , 2017, Pattern Recognit..

[43]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Gang Wang,et al.  Multi-modal feature fusion for action recognition in RGB-D sequences , 2014, 2014 6th International Symposium on Communications, Control and Signal Processing (ISCCSP).

[45]  Anton van den Hengel,et al.  Learning discriminative trajectorylet detector sets for accurate skeleton-based action recognition , 2015, Pattern Recognit..

[46]  Mohammed Bennamoun,et al.  A New Representation of Skeleton Sequences for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Jenny Benois-Pineau,et al.  Fusion in Computer Vision , 2014, Advances in Computer Vision and Pattern Recognition.

[49]  Nasser Kehtarnavaz,et al.  UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[50]  Gustavo Carneiro,et al.  Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[52]  Yun Fu,et al.  Max-Margin Heterogeneous Information Machine for RGB-D Action Recognition , 2017, International Journal of Computer Vision.

[53]  François Brémond,et al.  Modeling spatial layout of features for real world scenario RGB-D action recognition , 2016, 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[54]  Yun Fu,et al.  Bilinear heterogeneous information machine for RGB-D action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Anuj Srivastava,et al.  Action Recognition Using Rate-Invariant Analysis of Skeletal Shape Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Jing Zhang,et al.  RGB-D-based action recognition datasets: A survey , 2016, Pattern Recognit..

[57]  Arif Mahmood,et al.  HOPC: Histogram of Oriented Principal Components of 3D Pointclouds for Action Recognition , 2014, ECCV.