Combining Pose-Invariant Kinematic Features and Object Context Features for RGB-D Action Recognition

Action recognition using RGB-D cameras is a popular research topic. Recognising actions in a pose-invariant manner is very challenging due to view changes, posture changes and huge intra-class variations. This study aims to propose a novel pose-invariant action recognition framework based on kinematic features and object context features. Using RGB, depth and skeletal joints, the proposed framework extracts a novel set of pose-invariant motion kinematic features based on 3D scene flow and captures the motion of body parts with respect to the body. The obtained features are converted to a human body centric space that allows partial viewinvariant recognition of actions. The proposed pose-invariant kinematic features are extracted for both foreground (RGB and depth) and skeleton joints and separate classifiers are trained. Bordacount based classifier decision fusion is employed to obtain an action recognition result. For capturing object context features, a convolutional neural network (CNN) classifier is proposed to identify the involved objects. The proposed context features also include temporal information on object interaction and help in obtaining a final action recognition. The proposed framework works even with non-upright human postures and allows simultaneous action recognition for multiple people, which are topics that remain comparatively unresearched. The performance and robustness of the proposed pose-invariant action recognition framework are tested on several benchmark datasets. We also show that the proposed method works in real-time.

[1]  Jian-Huang Lai,et al.  Jointly Learning Heterogeneous Features for RGB-D Activity Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Silvio Savarese,et al.  Watch-n-patch: Unsupervised understanding of actions and relations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  John Flynn,et al.  Recognizing Actions in 3D Using Action-Snippets and Activated Simplices , 2016, AAAI.

[4]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[6]  Ronald M. Summers,et al.  Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning , 2016, IEEE Transactions on Medical Imaging.

[7]  Xiaodong Yang,et al.  Effective 3D action recognition using EigenJoints , 2014, J. Vis. Commun. Image Represent..

[8]  Ngoc Quoc Ly,et al.  Multiple Kernel Learning and Optical Flow for Action Recognition in RGB-D Video , 2015, 2015 Seventh International Conference on Knowledge and Systems Engineering (KSE).

[9]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[10]  Heung-Il Suk,et al.  View-Invariant 3D Action Recognition Using Spatiotemporal Self-Similarities from Depth Camera , 2014, 2014 22nd International Conference on Pattern Recognition.

[11]  Wei-Yun Yau,et al.  Human Action Recognition With Video Data: Research and Evaluation Challenges , 2014, IEEE Transactions on Human-Machine Systems.

[12]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Qing Zhang,et al.  A Survey on Human Motion Analysis from Depth Data , 2013, Time-of-Flight and Depth Imaging.

[14]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[15]  Yun Fu,et al.  Low-Rank Tensor Subspace Learning for RGB-D Action Recognition , 2016, IEEE Transactions on Image Processing.

[16]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Junsong Yuan,et al.  Spatio-Temporal Naive-Bayes Nearest-Neighbor (ST-NBNN) for Skeleton-Based Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Gang Wang,et al.  Global Context-Aware Attention LSTM Networks for 3D Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jun Li,et al.  Robust Representation and Recognition of Facial Emotions Using Extreme Sparse Learning , 2015, IEEE Transactions on Image Processing.

[20]  Li-Chen Fu,et al.  An efficient part-based approach to action recognition from RGB-D video with BoW-pyramid representation , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Daniel Cremers,et al.  A primal-dual framework for real-time dense RGB-D scene flow , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[23]  D. Ruta,et al.  An Overview of Classifier Fusion Methods , 2000 .

[24]  Anoop Cherian,et al.  Tensor Representations via Kernel Linearization for Action Recognition from 3D Skeletons , 2016, ECCV.

[25]  Alberto Del Bimbo,et al.  Submitted to Ieee Transactions on Cybernetics 1 3d Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold , 2022 .

[26]  Cewu Lu,et al.  Range-Sample Depth Feature for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.