A Sequential Approach to 3D Human Pose Estimation: Separation of Localization and Identification of Body Joints

In this paper, we propose a new approach to 3D human pose estimation from a single depth image. Conventionally, 3D human pose estimation is formulated as a detection problem of the desired list of body joints. Most of the previous methods attempted to simultaneously localize and identify body joints, with the expectation that the accomplishment of one task would facilitate the accomplishment of the other. However, we believe that identification hampers localization; therefore, the two tasks should be solved separately for enhanced pose estimation performance. We propose a two-stage framework that initially estimates all the locations of joints and subsequently identifies the estimated joints for a specific pose. The locations of joints are estimated by regressing K closest joints from every pixel with the use of a random tree. The identification of joints are realized by transferring labels from a retrieved nearest exemplar model. Once the 3D configuration of all the joints is derived, identification becomes much easier than when it is done simultaneously with localization, exploiting the reduced solution space. Our proposed method achieves significant performance gain on pose estimation accuracy, thereby improving both localization and identification. Experimental results show that the proposed method exhibits an accuracy significantly higher than those of previous approaches that simultaneously localize and identify the body parts.

[1]  Min Sun,et al.  Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Ho Yub Jung,et al.  Random tree walk toward instantaneous 3D human pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[5]  Ruigang Yang,et al.  Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Danica Kragic,et al.  Monocular real-time 3D articulated hand pose estimation , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[7]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Hans-Peter Seidel,et al.  Personalization and Evaluation of a Real-Time Depth-Based Full Body Tracker , 2013, 2013 International Conference on 3D Vision.

[9]  Sebastian Thrun,et al.  Real-Time Human Pose Tracking from Range Data , 2012, ECCV.

[10]  Reinhard Koch,et al.  Single View Motion Tracking by Depth and Silhouette Information , 2007, SCIA.

[11]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[12]  Yi Yang,et al.  Complex Event Detection using Semantic Saliency and Nearly-Isotonic SVM , 2015, ICML.

[13]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, CVPR.

[14]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[15]  Jinxiang Chai,et al.  Accurate realtime full-body motion capture using a single depth camera , 2012, ACM Trans. Graph..

[16]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[17]  Daniel Thalmann,et al.  Model-based hand pose estimation via spatial-temporal hand parsing and 3D fingertip localization , 2013, The Visual Computer.

[18]  Antonio Criminisi,et al.  Decision Forests for Computer Vision and Medical Image Analysis , 2013, Advances in Computer Vision and Pattern Recognition.

[19]  B. Triggs,et al.  3D human pose from silhouettes by relevance vector regression , 2004, CVPR 2004.

[20]  Cristian Sminchisescu,et al.  Iterated Second-Order Label Sensitive Pooling for 3D Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[22]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[23]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[24]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, CVPR.

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  Qi Zhao,et al.  Semantic saliency driven camera control for personal remote collaboration , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.