Efficient action recognition via local position offset of 3D skeletal body joints

To accurately recognize human actions in less computational time is one important aspect for practical usage. This paper presents an efficient framework for recognizing actions by a RGB-D camera. The novel action patterns in the framework are extracted via computing position offset of 3D skeletal body joints locally in the temporal extent of video. Action recognition is then performed by assembling these offset vectors using a bag-of-words framework and also by considering the spatial independence of body joints. We conducted extensive experiments on two benchmarking datasets: UCF dataset and MSRC-12 dataset, to demonstrate the effectiveness of the proposed framework. Experimental results suggest that the proposed framework 1) is very fast to extract action patterns and very simple in implementation; and 2) can achieve a comparable or a better performance in recognition accuracy compared with the state-of-the-art approaches.

[1]  Pascal Fua,et al.  Making Action Recognition Robust to Occlusions and Viewpoint Changes , 2010, ECCV.

[2]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Hanseok Ko,et al.  Hidden Markov Model on a unit hypersphere space for gesture trajectory recognition , 2014, Pattern Recognit. Lett..

[4]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[5]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[7]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[8]  Jun-Dong Cho,et al.  Optimizing a Virtual Re-Convergence System to Reduce Visual Fatigue in Stereoscopic Camera , 2012, IEICE Trans. Inf. Syst..

[9]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Hao Zhang,et al.  Kinect Gesture Recognition for Interactive System , 2012 .

[11]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[12]  Mohan M. Trivedi,et al.  Joint Angles Similarities and HOG2 for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[14]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[15]  Uwe Handmann,et al.  Fusion of Audio- and Visual Cues for Real-Life Emotional Human Robot Interaction , 2011, DAGM-Symposium.

[16]  Xiaodong Yang,et al.  Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[17]  Takumi Kobayashi,et al.  Motion recognition using local auto-correlation of space-time gradients , 2012, Pattern Recognit. Lett..

[18]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[19]  Eshed Ohn-Bar,et al.  Joint Angles Similiarities and HOG 2 for Action Recognition , 2013 .

[20]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[21]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.

[23]  Guoliang Lu,et al.  Extraction of action patterns using local temporal self-similarities of skeletal body-joints , 2013, 2013 6th International Congress on Image and Signal Processing (CISP).

[24]  Joseph J. LaViola,et al.  Measuring and reducing observational latency when recognizing actions , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[25]  Mineichi Kudo,et al.  Selection of Characteristic Frames in Video for Efficient Action Recognition , 2012, IEICE Trans. Inf. Syst..

[26]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Anuj Srivastava,et al.  Accurate 3D action recognition using learning on the Grassmann manifold , 2015, Pattern Recognit..

[28]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[29]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Salvatore Iengo Human Gesture Recognition and Robot Attentional Regulation for Human-Robot Interaction , 2014 .

[31]  Yale Song,et al.  Distribution-sensitive learning for imbalanced datasets , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[32]  Xuemei Guo,et al.  Elderly-falling detection using distributed direction-sensitive pyroelectric infrared sensor arrays , 2012, Multidimens. Syst. Signal Process..

[33]  Alexandros André Chaaraoui,et al.  Evolutionary joint selection to improve human action recognition with RGB-D devices , 2014, Expert Syst. Appl..

[34]  Mineichi Kudo,et al.  Learning action patterns in difference images for efficient action recognition , 2014, Neurocomputing.

[35]  Qiuqi Ruan,et al.  Activity Recognition from RGB-D Camera with 3D Local Spatio-temporal Features , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[36]  Mineichi Kudo,et al.  Temporal segmentation and assignment of successive actions in a long-term video , 2013, Pattern Recognit. Lett..

[37]  Mineichi Kudo,et al.  Self-Similarities in Difference Images: A New Cue for Single-Person Oriented Action Recognition , 2013, IEICE Trans. Inf. Syst..

[38]  Behzad Dariush,et al.  Kinematic self retargeting: A framework for human pose estimation , 2010, Comput. Vis. Image Underst..