Human action recognition using depth motion maps pyramid and discriminative collaborative representation classifier

Abstract. Human action recognition is a challenging task in machine learning and pattern recognition. This paper presents an action recognition framework based on depth sequences. An effective feature descriptor named depth motion maps pyramid (DMMP) inspired by DMMs is developed. First, a series of DMMs with temporal scales are constructed to effectively capture spatial–temporal motion patterns of human actions. Then these DMMs are fused to obtain the final descriptor named DMMs pyramid. Second, we propose a discriminative collaborative representation classifier (DCRC), where an extra constraint on the collaborative coefficient is imposed to provide prior knowledge for the representation coefficient. In addition, we apply DCRC to encode the obtained features and recognize the human actions. The proposed framework is evaluated on MSR three-dimensional (3-D) action datasets, MSR hand gesture dataset, UTD-MHAD, and MSR daily Activity3D dataset, respectively. The experimental results indicate the effectiveness of our proposed method for human action recognition.

[1]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[2]  Lei Zhang,et al.  Sparse representation or collaborative representation: Which helps face recognition? , 2011, 2011 International Conference on Computer Vision.

[3]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Nasser Kehtarnavaz,et al.  Real-time human action recognition based on depth motion maps , 2013, Journal of Real-Time Image Processing.

[5]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yifeng He,et al.  Human action recognition using temporal hierarchical pyramid of depth motion map and KECA , 2015, 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP).

[7]  Philip S. Yu,et al.  Spatiotemporal Pyramid Network for Video Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Nasser Kehtarnavaz,et al.  UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[9]  Ling Guan,et al.  Action recognition using multi-layer Depth Motion maps and Sparse Dictionary Learning , 2015, 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP).

[10]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[11]  Zihan Zhou,et al.  Towards a practical face recognition system: Robust registration and illumination by sparse representation , 2009, CVPR.

[12]  Xiaodong Yang,et al.  Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Yingli Tian,et al.  Histogram of 3D Facets: A depth descriptor for human action and hand gesture recognition , 2015, Comput. Vis. Image Underst..

[15]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Helio Pedrini,et al.  Real-time action recognition using a multilayer descriptor with variable size , 2016, J. Electronic Imaging.

[17]  Junsong Yuan,et al.  Learning Actionlet Ensemble for 3D Human Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Georgios Evangelidis,et al.  Skeletal Quads: Human Action Recognition Using Joint Quadruples , 2014, 2014 22nd International Conference on Pattern Recognition.

[19]  Vinodkrishnan Kulathumani,et al.  Multiview fusion for activity recognition using deep neural networks , 2016, J. Electronic Imaging.

[20]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[21]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Anuj Srivastava,et al.  Action Recognition Using Rate-Invariant Analysis of Skeletal Shape Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Ajmal S. Mian,et al.  Sparseness helps: Sparsity Augmented Collaborative Representation for Classification , 2015, ArXiv.

[24]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[25]  Hossein Mobahi,et al.  Face recognition with contiguous occlusion using markov random fields , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[26]  Hong Liu,et al.  3D Action Recognition Using Multi-Temporal Depth Motion Maps and Fisher Vector , 2016, IJCAI.

[27]  Rui Yang,et al.  DMM-Pyramid Based Deep Architectures for Action Recognition with Depth Cameras , 2014, ACCV.

[28]  Luca Lombardi,et al.  Development of gesture-based human-computer interaction applications by fusion of depth and colour video streams , 2014, IET Comput. Vis..

[29]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[30]  Ling Shao,et al.  Action Recognition Using 3D Histograms of Texture and A Multi-Class Boosting Classifier , 2017, IEEE Transactions on Image Processing.

[31]  Nasser Kehtarnavaz,et al.  Action Recognition from Depth Sequences Using Depth Motion Maps-Based Local Binary Patterns , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[32]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[33]  Nasser Kehtarnavaz,et al.  Home-based Senior Fitness Test measurement system using collaborative inertial and depth sensors , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[34]  Pichao Wang,et al.  Scene Flow to Action Map: A New Representation for RGB-D Based Action Recognition with Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xiaofeng Wang,et al.  Action recognition using vague division DMMs , 2017 .

[36]  Nasser Kehtarnavaz,et al.  A survey of depth and inertial sensor fusion for human action recognition , 2015, Multimedia Tools and Applications.

[37]  Arif Mahmood,et al.  Discriminative human action classification using locality-constrained linear coding , 2016, Pattern Recognit. Lett..

[38]  Mario Fernando Montenegro Campos,et al.  On the improvement of human action recognition from depth map sequences using Space-Time Occupancy Patterns , 2014, Pattern Recognit. Lett..

[39]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Jake K. Aggarwal,et al.  Human activity recognition from 3D data: A review , 2014, Pattern Recognit. Lett..