Effective 3D action recognition using EigenJoints

HighlightsEffective method to recognize human actions using 3D skeleton joints.New action feature descriptor, EigenJoints, for action recognition.Accumulated Motion Energy (AME) method to perform informative frames selection.Our proposed approach significantly outperforms the state-of-the-art methods on three public datasets. In this paper, we propose an effective method to recognize human actions using 3D skeleton joints recovered from 3D depth data of RGBD cameras. We design a new action feature descriptor for action recognition based on differences of skeleton joints, i.e., EigenJoints which combine action information including static posture, motion property, and overall dynamics. Accumulated Motion Energy (AME) is then proposed to perform informative frame selection, which is able to remove noisy frames and reduce computational cost. We employ non-parametric Naive-Bayes-Nearest-Neighbor (NBNN) to classify multiple actions. The experimental results on several challenging datasets demonstrate that our approach outperforms the state-of-the-art methods. In addition, we investigate how many frames are necessary for our method to perform classification in the scenario of online action recognition. We observe that the first 30-40% frames are sufficient to achieve comparable results to that using the entire video sequences on the MSR Action3D dataset.

[1]  Liangliang Cao,et al.  MediaCCNY at TRECVID 2012: Surveillance Event Detection , 2012, TRECVID.

[2]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[3]  Jenq-Neng Hwang,et al.  Integrated video object tracking with applications in trajectory-based event detection , 2011, J. Vis. Commun. Image Represent..

[4]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Dacheng Tao,et al.  Slow Feature Analysis for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[7]  Xiaodong Yang,et al.  EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[8]  Gang Yu,et al.  Real-time human action search using random forest based hough voting , 2011, ACM Multimedia.

[9]  Ming Yang,et al.  Surveillance Event Detection , 2008, TRECVID.

[10]  David G. Lowe,et al.  Local Naive Bayes Nearest Neighbor for image classification , 2011, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[13]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Bart Selman,et al.  Unstructured human activity detection from RGBD images , 2011, 2012 IEEE International Conference on Robotics and Automation.

[16]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[17]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[18]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[19]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Dacheng Tao,et al.  This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS 1 Cross-Domain Human Action Recognition , 2022 .

[21]  Xiaodong Yang,et al.  Histogram of 3D Facets: A characteristic descriptor for hand gesture recognition , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[22]  Sunil Arya,et al.  Expected-case complexity of approximate nearest neighbor searching , 2000, SODA '00.

[23]  Wei Liang,et al.  Discriminative human action recognition in the learned hierarchical manifold space , 2010, Image Vis. Comput..

[24]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[26]  Marco Roccetti,et al.  Playing into the wild: A gesture-based interface for gaming in public spaces , 2012, J. Vis. Commun. Image Represent..

[27]  Min Sun,et al.  Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[29]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[30]  Hatice Gunes,et al.  Automatic Temporal Segment Detection and Affect Recognition From Face and Body Display , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[31]  Ming-Ting Sun,et al.  Automatic video activity detection using compressed domain motion trajectories for H.264 videos , 2011, J. Vis. Commun. Image Represent..

[32]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.