Real time action recognition using histograms of depth gradients and random decision forests

We propose an algorithm which combines the discriminative information from depth images as well as from 3D joint positions to achieve high action recognition accuracy. To avoid the suppression of subtle discriminative information and also to handle local occlusions, we compute a vector of many independent local features. Each feature encodes spatiotemporal variations of depth and depth gradients at a specific space-time location in the action volume. Moreover, we encode the dominant skeleton movements by computing a local 3D joint position difference histogram. For each joint, we compute a 3D space-time motion volume which we use as an importance indicator and incorporate in the feature vector for improved action discrimination. To retain only the discriminant features, we train a random decision forest (RDF). The proposed algorithm is evaluated on three standard datasets and compared with nine state-of-the-art algorithms. Experimental results show that, on the average, the proposed algorithm outperform all other algorithms in accuracy and have a processing speed of over 112 frames/second.

[1]  Mario Fernando Montenegro Campos,et al.  STOP: Space-Time Occupancy Patterns for 3D Action Recognition from Depth Map Sequences , 2012, CIARP.

[2]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[4]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Lale Akarun,et al.  Real time hand pose estimation using depth sensors , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[7]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[8]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[9]  James M. Keller,et al.  Histogram of Oriented Normal Vectors for Object Recognition with a Depth Sensor , 2012, ACCV.

[10]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[12]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[13]  Z. Liu,et al.  A real time system for dynamic hand gesture recognition with a depth sensor , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.