A Review of Dynamic Maps for 3D Human Motion Recognition Using ConvNets and Its Improvement

RGB-D based action recognition is attracting more and more attention in both the research and industrial communities. However, due to the lack of training data, pre-training based methods are popular in this field. This paper presents a review of the concept of dynamic maps for RGB-D based human motion recognition using pretrained models in image domain. The dynamic maps recursively encode the spatial, temporal and structural information contained in the video sequence into dynamic motion images simultaneously. They enable the usage of Convolutional Neural Network and its pretained models on ImageNet for 3D human motion recognition. This simple, compact and effective representation achieves state-of-the-art results on various gesture/action/activities recognition datasets. Based on the review of previous methods using this concept upon different modalities (depth, skeleton or RGB-D data), a novel encoding scheme is developed and presented in this paper. The improved method generates effective flow-guided dynamic maps, and they could select the high motion window and distinguish the order among the frames with small motion. The improved flow-guided dynamic maps achieve state-of-the-art results on the large Chalearn LAP IsoGD and NTU RGB+D datasets.

[1]  Nasser Kehtarnavaz,et al.  UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[2]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[3]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[4]  Kristen Grauman,et al.  Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrea Vedaldi,et al.  Dynamic Image Networks for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Mohan S. Kankanhalli,et al.  Benchmarking a Multimodal and Multiview and Interactive Dataset for Human Action Recognition , 2017, IEEE Transactions on Cybernetics.

[9]  Pichao Wang,et al.  Mining Mid-Level Features for Action Recognition Based on Effective Skeleton Representation , 2014, 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[10]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Jing Zhang,et al.  ConvNets-Based Action Recognition from Depth Maps through Virtual Cameras and Pseudocoloring , 2015, ACM Multimedia.

[12]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Rama Chellappa,et al.  Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Guo-Jun Qi,et al.  Differential Recurrent Neural Networks for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Cewu Lu,et al.  Range-Sample Depth Feature for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Pichao Wang,et al.  Large-Scale Multimodal Gesture Segmentation and Recognition Based on Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[18]  Pichao Wang,et al.  Scene Flow to Action Map: A New Representation for RGB-D Based Action Recognition with Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Yang Xiao,et al.  Action Recognition for Depth Video using Multi-view Dynamic Images , 2018, Inf. Sci..

[21]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[22]  Jun Wan,et al.  A Unified Framework for Multi-Modal Isolated Gesture Recognition , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[23]  Xinyu Wu,et al.  The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences , 2017, Knowl. Based Syst..

[24]  Pichao Wang,et al.  Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Sergio Escalera,et al.  ChaLearn Looking at People RGB-D Isolated and Continuous Datasets for Gesture Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Jing Zhang,et al.  Action Recognition From Depth Maps Using Deep Convolutional Neural Networks , 2016, IEEE Transactions on Human-Machine Systems.

[27]  Pichao Wang,et al.  Large-Scale Multimodal Gesture Recognition Using Heterogeneous Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[28]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[29]  Pichao Wang,et al.  Joint Distance Maps Based Action Recognition With Convolutional Neural Networks , 2017, IEEE Signal Processing Letters.

[30]  Juan Song,et al.  Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM , 2017, IEEE Access.

[31]  Xiaodong Yang,et al.  Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Hong Liu,et al.  3D Action Recognition Using Multiscale Energy-Based Global Ternary Image , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  Pichao Wang,et al.  Large-scale Isolated Gesture Recognition using Convolutional Neural Networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[35]  Pichao Wang,et al.  Depth Pooling Based Large-Scale 3-D Action Recognition With Convolutional Neural Networks , 2018, IEEE Transactions on Multimedia.

[36]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[37]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Pichao Wang,et al.  Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks , 2016, ACM Multimedia.

[39]  Gang Wang,et al.  Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition , 2016, ECCV.

[40]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Jun Wan,et al.  Explore Efficient Local Features from RGB-D Data for One-Shot Learning Gesture Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[43]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).