Combining Human Body Shape and Pose Estimation for Robust Upper Body Tracking Using a Depth Sensor

Rapid and accurate estimation of a person’s upper body shape and real-time tracking of the pose in the presence of occlusions is crucial for many future assistive technologies, health care applications and telemedicine systems. We propose to tackle this challenging problem by combining data-driven and generative methods for both body shape and pose estimation. Our strategy comprises a subspace-based method to predict body shape directly from a single depth map input, and a random forest regression approach to obtain a sound initialization for pose estimation of the upper body. We propose a model-fitting strategy in order to refine the estimated body shape and to exploit body shape information for improving pose accuracy. During tracking, we feed refinement results back into the forest-based joint position regressor to stabilize and accelerate pose estimation over time. Our tracking framework is designed to cope with viewpoint limitations and occlusions due to dynamic objects.

[1]  Sehoon Ha,et al.  Iterative Training of Dynamic Skills Inspired by Human Coaching Techniques , 2014, ACM Trans. Graph..

[2]  Michael J. Black,et al.  Detailed Full-Body Reconstructions of Moving People from Monocular RGB-D Sequences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Jian Sun,et al.  Global refinement of random forest , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Shahram Izadi,et al.  Modeling Kinect Sensor Noise for Improved 3D Reconstruction and Tracking , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[5]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Björn Stenger,et al.  Human Body Shape Estimation Using a Multi-resolution Manifold Forest , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Min Sun,et al.  Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yang Li,et al.  Measuring Accurate Body Parameters of Dressed Humans with Large-Scale Motion Using a Kinect Sensor , 2013, Sensors.

[11]  Ligang Liu,et al.  Scanning 3D Full Human Bodies Using Kinects , 2012, IEEE Transactions on Visualization and Computer Graphics.

[12]  Bernt Schiele,et al.  Building statistical shape spaces for 3D human modeling , 2015, Pattern Recognit..

[13]  Ruigang Yang,et al.  Accurate 3D pose estimation from a single depth image , 2011, 2011 International Conference on Computer Vision.

[14]  Lena Maier-Hein,et al.  Real-Time Range Imaging in Health Care: A Survey , 2013, Time-of-Flight and Depth Imaging.

[15]  Michael J. Black,et al.  Dyna: a model of dynamic human shape in motion , 2015, ACM Trans. Graph..

[16]  Kathleen M. Robinette,et al.  The CAESAR project: a 3-D surface anthropometry survey , 1999, Second International Conference on 3-D Digital Imaging and Modeling (Cat. No.PR00062).

[17]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[18]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[20]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[21]  Hans-Peter Seidel,et al.  A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[22]  Ho Yub Jung,et al.  Random tree walk toward instantaneous 3D human pose estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Bo Fu,et al.  Quality Dynamic Human Body Modeling Using a Single Low-Cost Depth Camera , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Michael J. Black,et al.  Home 3D body scans from noisy image and range data , 2011, 2011 International Conference on Computer Vision.

[25]  Sebastian Thrun,et al.  Real-Time Human Pose Tracking from Range Data , 2012, ECCV.

[26]  Reinhard Koch,et al.  Single View Motion Tracking by Depth and Silhouette Information , 2007, SCIA.

[27]  Ming Zeng,et al.  Templateless Quasi-rigid Shape Modeling with Implicit Loop-Closure , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[29]  Didier Stricker,et al.  KinectAvatar: Fully Automatic Body Capture Using a Single Kinect , 2012, ACCV Workshops.

[30]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[31]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[32]  Michael J. Black,et al.  The stitched puppet: A graphical model of 3D human shape and pose , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Hans-Peter Seidel,et al.  Personalization and Evaluation of a Real-Time Depth-Based Full Body Tracker , 2013, 2013 International Conference on 3D Vision.