Utilization of Color-depth Combination Features and Multi-level Refinement CNN for Upper-limb Posture Recognition

Upper-limb posture recognition is of great value to rehabilitation and assessment of stroke patients. In this paper, we propose a novel method for upper-limb posture recognition. Convolutional neural network (CNN) cascade is applied to reduce the training difficulty of the algorithm. Information of depth and color is combined to eliminate the influence of complex background and illumination variation. Kinect is used to automatically acquire a large number of upper limb posture labels. The principle of coarse-to-fine runs through the whole algorithm. The overall network architecture consists of 3 levels with a total of six CNNs. First of all, color and depth images are aligned to obtain a RGB-D quad channels image. The quad-channel image is sent to level-1 cascade network to obtain a bounding box containing the upper limb cropped from the entire body. Then, the resulting bounding box is brought into level-2 cascade network and 4 sets of rough upper-limb joints coordinates are obtained. Finally, zoom in the visual field to local area of 4 key points, 4 sets of accurate coordinate is obtained by level-3 cascade network. Experimental results show that upper-limb posture is calculated by the proposed algorithm that has strong stability to both illumination and background problems.

[1]  Philip H. S. Torr,et al.  Randomized trees for human pose detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Matteo Munaro,et al.  Performance evaluation of the 1st and 2nd generation Kinect for multimedia applications , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[3]  Andrew Blake,et al.  Contour-based learning for object detection , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Nassir Navab,et al.  Recognizing multiple human activities and tracking full-body pose in unconstrained environments , 2012, Pattern Recognit..

[6]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[7]  Andrea Fossati,et al.  Consumer Depth Cameras for Computer Vision , 2013, Advances in Computer Vision and Pattern Recognition.

[8]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[9]  Rüdiger Dillmann,et al.  Fusion of 2d and 3d sensor data for articulated body tracking , 2009, Robotics Auton. Syst..

[10]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[11]  Reinhard Koch,et al.  Single View Motion Tracking by Depth and Silhouette Information , 2007, SCIA.

[12]  Nassir Navab,et al.  Manifold Learning for ToF-based Human Body Tracking and Activity Recognition , 2010, BMVC.

[13]  Reinhard Koch,et al.  Nonlinear Body Pose Estimation from Depth Images , 2005, DAGM-Symposium.

[14]  Lin Yang,et al.  Evaluating and Improving the Depth Accuracy of Kinect for Windows v2 , 2015, IEEE Sensors Journal.