3D Action Recognition Using Multiscale Energy-Based Global Ternary Image

This paper presents an effective multiscale energy-based global ternary image (E-GTI) representation for action recognition from depth sequences. The unique property of our representation is that it takes the spatiotemporal discrimination and action speed variations into account, intending to solve the problems of distinguishing similar actions and identifying the actions with different speeds in one goal. The entire method is carried out in two stages. In the first stage, consecutive depth frames are used to generate global ternary image (GTI) features, which implicitly capture both inter-frame motion regions and motion directions. Specifically, each pixel in the GTI represents one of three possible states, namely, positive, negative, and neutral, which indicate the increased, decreased, and same depth values, respectively. To cope with speed variations in actions, energy-based sampling method is utilized, leading to multiscale E-GTI features, where the multiscale scheme can efficiently capture the temporal relationships among frames. In the second stage, all the E-GTI features are transformed by Radon transform (RT) as robust descriptors, which are aggregated by the bag-of-visual-words model as a compact representation. Extensive experiments on benchmark data sets show that our representation outperforms state-of-the-art approaches, since it captures discriminating spatiotemporal information of actions. Due to the merits of energy-based sampling and RT methods, our representation shows robustness to speed variations, depth noise, and partial occlusions.

[1]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[2]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[3]  Yun Yang,et al.  Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features , 2016, Multimedia Tools and Applications.

[4]  Lei Wang,et al.  Beyond Covariance: Feature Representation with Nonlinear Kernel Matrices , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Nasser Kehtarnavaz,et al.  Action Recognition from Depth Sequences Using Depth Motion Maps-Based Local Binary Patterns , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[6]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[7]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Guo-Jun Qi,et al.  Differential Recurrent Neural Networks for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Nasser Kehtarnavaz,et al.  Improving Human Action Recognition Using Fusion of Depth Camera and Inertial Sensors , 2015, IEEE Transactions on Human-Machine Systems.

[11]  Xiaodong Yang,et al.  Histogram of 3D Facets: A characteristic descriptor for hand gesture recognition , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[12]  Arif Mahmood,et al.  Histogram of Oriented Principal Components for Cross-View Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yan Zhang,et al.  Local Surface Geometric Feature for 3D human action recognition , 2016, Neurocomputing.

[14]  Lu Yang,et al.  Survey on 3D Hand Gesture Recognition , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Pol Cirujeda,et al.  4DCov: A Nested Covariance Descriptor of Spatio-Temporal Features for Gesture Recognition in Depth Sequences , 2014, 2014 2nd International Conference on 3D Vision.

[17]  Hong Liu,et al.  Learning directional co-occurrence for human action classification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Daijin Kim,et al.  Robust human activity recognition from depth video using spatiotemporal multi-fused features , 2017, Pattern Recognit..

[19]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Hong Liu,et al.  Salient pairwise spatio-temporal interest points for real-time activity recognition , 2016, CAAI Trans. Intell. Technol..

[21]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[22]  Bingbing Ni,et al.  RGBD-HuDaAct: A color-depth video database for human daily activity recognition , 2011, ICCV Workshops.

[23]  Hairong Qi,et al.  Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Alexandros André Chaaraoui,et al.  A discussion on the validation tests employed to compare human action recognition methods using the MSR Action3D dataset , 2014, ArXiv.

[26]  Yingli Tian,et al.  Edge Enhanced Depth Motion Map for Dynamic Hand Gesture Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[27]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[29]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Li Ma,et al.  Coupled hidden conditional random fields for RGB-D human action recognition , 2015, Signal Process..

[31]  Nasser Kehtarnavaz,et al.  A Real-Time Human Action Recognition System Using Depth and Inertial Sensor Fusion , 2016, IEEE Sensors Journal.

[32]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Rui Yang,et al.  DMM-Pyramid Based Deep Architectures for Action Recognition with Depth Cameras , 2014, ACCV.

[34]  Dan Schonfeld,et al.  Event Analysis Based on Multiple Interactive Motion Trajectories , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[36]  Andreas E. Savakis,et al.  Grassmannian Sparse Representations and Motion Depth Surfaces for 3D Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[37]  Yun Fu,et al.  Hierarchical 3D kernel descriptors for action recognition using depth sequences , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[38]  Hong Liu,et al.  3D Action Recognition Using Multi-Temporal Depth Motion Maps and Fisher Vector , 2016, IJCAI.

[39]  Nasser Kehtarnavaz,et al.  A survey of depth and inertial sensor fusion for human action recognition , 2015, Multimedia Tools and Applications.

[40]  Nasser Kehtarnavaz,et al.  Real-time human action recognition based on depth motion maps , 2016, Journal of Real-Time Image Processing.

[41]  Tian-Tsong Ng,et al.  Multimodal Multipart Learning for Action Recognition in Depth Videos , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[43]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[44]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[45]  H. Zhang,et al.  Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition , 2015, Neurocomputing.

[46]  Susanna Spinsante,et al.  Human Action Recognition Based on Temporal Pyramid of Key Poses Using RGB-D Sensors , 2016, ACIVS.

[47]  Xiaodong Yang,et al.  Super Normal Vector for Human Activity Recognition with Depth Cameras , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Ling Guan,et al.  Spatio-Temporal Pyramid Model based on depth maps for action recognition , 2015, 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP).

[49]  Min-Chun Hu,et al.  Human action recognition and retrieval using sole depth information , 2012, ACM Multimedia.

[50]  Alan L. Yuille,et al.  An Approach to Pose-Based Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Cewu Lu,et al.  Range-Sample Depth Feature for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  James M. Keller,et al.  Histogram of Oriented Normal Vectors for Object Recognition with a Depth Sensor , 2012, ACCV.

[53]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[54]  Alan L. Yuille,et al.  Mining 3D Key-Pose-Motifs for Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Arif Mahmood,et al.  Discriminative human action classification using locality-constrained linear coding , 2016, Pattern Recognit. Lett..

[56]  Yun Yang,et al.  Action recognition using completed local binary patterns and multiple-class boosting classifier , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[57]  Junsong Yuan,et al.  Learning Actionlet Ensemble for 3D Human Action Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Hong Liu,et al.  Human action recognition using Adaptive Hierarchical Depth Motion Maps and Gabor filter , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[59]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[60]  Jing Zhang,et al.  RGB-D-based action recognition datasets: A survey , 2016, Pattern Recognit..

[61]  Yingli Tian,et al.  Histogram of 3D Facets: A depth descriptor for human action and hand gesture recognition , 2015, Comput. Vis. Image Underst..

[62]  Jing Zhang,et al.  Action Recognition From Depth Maps Using Deep Convolutional Neural Networks , 2016, IEEE Transactions on Human-Machine Systems.

[63]  Arif Mahmood,et al.  Real time action recognition using histograms of depth gradients and random decision forests , 2014, IEEE Winter Conference on Applications of Computer Vision.

[64]  Fabio Cuzzolin,et al.  3D Activity Recognition Using Motion History and Binary Shape Templates , 2014, ACCV Workshops.

[65]  Michael Harville,et al.  Foreground segmentation using adaptive mixture models in color and depth , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[66]  Pichao Wang,et al.  Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks , 2016, ACM Multimedia.

[67]  Hong Liu,et al.  Depth Context: a new descriptor for human activity recognition by using sole depth sequences , 2016, Neurocomputing.

[68]  Jake K. Aggarwal,et al.  Human activity recognition from 3D data: A review , 2014, Pattern Recognit. Lett..