EigenJoints-based action recognition using Naïve-Bayes-Nearest-Neighbor

In this paper, we propose an effective method to recognize human actions from 3D positions of body joints. With the release of RGBD sensors and associated SDK, human body joints can be extracted in real time with reasonable accuracy. In our method, we propose a new type of features based on position differences of joints, EigenJoints, which combine action information including static posture, motion, and offset. We further employ the Naïve-Bayes-Nearest-Neighbor (NBNN) classifier for multi-class action classification. The recognition results on the Microsoft Research (MSR) Action3D dataset demonstrate that our approach significantly outperforms the state-of-the-art methods. In addition, we investigate how many frames are necessary for our method to recognize actions on the MSR Action3D dataset. We observe 15-20 frames are sufficient to achieve comparable results to that using the entire video sequences.

[1]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[2]  Sunil Arya,et al.  Expected-case complexity of approximate nearest neighbor searching , 2000, SODA '00.

[3]  Mubarak Shah,et al.  View-invariance in action recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[7]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[9]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[13]  Joseph J. LaViola,et al.  Measuring and reducing observational latency when recognizing actions , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[14]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.