Posture Recognition Using an RGB-D Camera: Exploring 3D Body Modeling and Deep Learning Approaches

The emergence of RGB-D sensors offered new possibilities for addressing complex artificial vision problems efficiently. Human posture recognition is among these computer vision problems, with a wide range of applications such as ambient assisted living and intelligent health care systems. In this context, our paper presents novel methods and ideas to design automatic posture recognition systems using an RGB-D camera. More specifically, we introduce two supervised methods to learn and recognize human postures using the main types of visual data provided by an RGB-D camera. The first method is based on convolutional features extracted from 2D images. Convolutional Neural Networks (CNNs) are trained to recognize human postures using transfer learning on RGB and depth images. Secondly, we propose to model the posture using the body joint configuration in the 3D space. Posture recognition is then performed through SVM classification of 3D skeleton-based features. To evaluate the proposed methods, we created a challenging posture recognition dataset with a considerable variability regarding the acquisition conditions. The experimental results demonstrated comparable performances and high precision for both methods in recognizing human postures, with a slight superiority for the CNN-based method when applied on depth images. Moreover, the two approaches demonstrated a high robustness to several perturbation factors, such as scale and orientation change.

[1]  Stefan Wermter,et al.  A Multichannel Convolutional Neural Network for Hand Posture Recognition , 2014, ICANN.

[2]  Monique Thonnat,et al.  Human Posture Recognition in Video Sequence , 2003 .

[3]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[4]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Thomas B. Moeslund,et al.  3D human pose estimation using 2D-Data and an alternative phase space representation , 2000 .

[7]  Ao Tang,et al.  A Real-Time Hand Posture Recognition System Using Deep Neural Networks , 2015, ACM Trans. Intell. Syst. Technol..

[8]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[9]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[10]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Cristian Sminchisescu,et al.  The Moving Pose: An Efficient 3D Kinematics Descriptor for Low-Latency Action Recognition and Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Bo Li,et al.  Intelligent video surveillance for real-time detection of suicide attempts , 2018, Pattern Recognit. Lett..