Human Action Recognition Using APJ3D and Random Forests

Human action recognition is an important yet challenging task. In this paper, a simple and efficient method based on random forests is proposed for human action recognition. First, we extract the 3D skeletal joint locations from depth images. The APJ3D computed from the action depth image sequences by employing the 3D joint position features and the 3D joint angle features, and then clustered into K-means algorithm, which represent the typical postures of actions. By employing the improved Fourier Temporal Pyramid, we recognize actions using random forests. The proposed method is evaluated by using a public video dataset, namely UTKinect-Action dataset. This dataset is constituted of 200 3D sequences of 10 activities performed by 10 individuals in varied views. Experimental results show that the robustness of 3D skeletal joint location estimation display very well results, and the proposed method performs very well on that dataset. In addition, due to the design of our method and the robust 3D skeletal joint locations estimation from RGB-D sensor, our method demonstrates significant reliability against noise on 3D action dataset.

[1]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Alan V. Oppenheim,et al.  Discrete-time signal processing (2nd ed.) , 1999 .

[3]  Tae-Seong Kim,et al.  Human Activity Recognition Using Body Joint‐Angle Features and Hidden Markov Model , 2011 .

[4]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Jing Zhang,et al.  A Parallel K-Means Clustering Algorithm with MPI , 2011, 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming.

[6]  Hironobu Fujiyoshi,et al.  Real-time human motion analysis by image skeletonization , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[7]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8]  Chunying Fang,et al.  Speech Emotion Recognition based on Optimized Support Vector Machine , 2012, J. Softw..

[9]  G. Johansson Visual motion perception. , 1975, Scientific American.

[10]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[11]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[12]  Darko Kirovski,et al.  Real-time classification of dance gestures from skeleton animation , 2011, SCA '11.

[13]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[14]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[15]  Hua Jin,et al.  Eye Location Based on Adaboost and Random Forests , 2012, J. Softw..

[16]  Jake K. Aggarwal,et al.  Human action recognition with extremities as semantic posture representation , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Yang Wang,et al.  Semi-Latent Dirichlet Allocation: A Hierarchical Model for Human Action Recognition , 2007, Workshop on Human Motion.

[18]  Mubarak Shah,et al.  Recognizing human actions , 2005, VSSN@MM.

[19]  Jake K. Aggarwal,et al.  Human detection using depth information by Kinect , 2011, CVPR 2011 WORKSHOPS.

[20]  V. M. Zat︠s︡iorskiĭ Kinematics of human motion , 1998 .

[21]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[23]  W. Marsden I and J , 2012 .

[24]  Md. Monirul Islam,et al.  Performance of PCA Based Semi-supervised Learning in Face Recognition Using MPEG-7 Edge Histogram Descriptor , 2011, J. Multim..