A data-driven approach for real-time full body pose reconstruction from a depth camera

In recent years, depth cameras have become a widely available sensor type that captures depth images at real-time frame rates. Even though recent approaches have shown that 3D pose estimation from monocular 2.5D depth images has become feasible, there are still challenging problems due to strong noise in the depth data and self-occlusions in the motions being captured. In this paper, we present an efficient and robust pose estimation framework for tracking full-body motions from a single depth image stream. Following a data-driven hybrid strategy that combines local optimization with global retrieval techniques, we contribute several technical improvements that lead to speed-ups of an order of magnitude compared to previous approaches. In particular, we introduce a variant of Dijkstra's algorithm to efficiently extract pose features from the depth data and describe a novel late-fusion scheme based on an efficiently computable sparse Hausdorff distance to combine local and global pose estimates. Our experiments show that the combination of these techniques facilitates real-time tracking with stable results even for fast and complex motions, making it applicable to a wide range of inter-active scenarios.

[1]  Hans-Peter Seidel,et al.  Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[2]  Linda G. Shapiro,et al.  Computer Vision , 2001 .

[3]  Nassir Navab,et al.  Estimating human 3D pose from Time-of-Flight images based on geodesic distances and optical flow , 2011, Face and Gesture 2011.

[4]  Janne Heikkilä,et al.  A four-step camera calibration procedure with implicit image correction , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Rómer Rosales,et al.  Combining Generative and Discriminative Models in a Framework for Articulated Pose Estimation , 2006, International Journal of Computer Vision.

[6]  Jinxiang Chai,et al.  VideoMocap: modeling physically realistic human motion from monocular video sequences , 2010, SIGGRAPH 2010.

[7]  Behzad Dariush,et al.  Kinematic self retargeting: A framework for human pose estimation , 2010, Comput. Vis. Image Underst..

[8]  Craig Gotsman,et al.  Articulated Object Reconstruction and Markerless Motion Capture from Depth Video , 2008, Comput. Graph. Forum.

[9]  Vincent Lepetit,et al.  From Canonical Poses to 3D Motion Capture Using a Single Camera , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[11]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Rüdiger Dillmann,et al.  Fusion of 2d and 3d sensor data for articulated body tracking , 2009, Robotics Auton. Syst..

[13]  Cristian Sminchisescu,et al.  Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[14]  Behzad Dariush,et al.  Controlled human pose estimation from depth image streams , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Trevor Darrell,et al.  Avoiding the "streetlight effect": tracking by exploring likelihood modes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Tamim Asfour,et al.  Robust real-time stereo-based markerless human motion capture , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[17]  John P. Lewis,et al.  Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation , 2000, SIGGRAPH.

[18]  Jovan Popović,et al.  Real-time hand-tracking with a color glove , 2009, SIGGRAPH 2009.

[19]  Jitendra Malik,et al.  Twist Based Acquisition and Tracking of Animal and Human Kinematics , 2004, International Journal of Computer Vision.

[20]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[21]  Ian D. Reid,et al.  Articulated Body Motion Capture by Stochastic Search , 2005, International Journal of Computer Vision.

[22]  Reinhard Koch,et al.  Time-of-Flight sensor calibration for accurate range sensing , 2010, Comput. Vis. Image Underst..

[23]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[24]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[26]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[27]  Gérard G. Medioni,et al.  Human pose estimation from a single view point, real-time range sensor , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[28]  Michael J. Black,et al.  Home 3D body scans from noisy image and range data , 2011, 2011 International Conference on Computer Vision.

[29]  Adolfo López,et al.  Real-time upper body tracking with online initialization using a range sensor , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[30]  Reinhard Koch,et al.  Time‐of‐Flight Cameras in Computer Graphics , 2010, Comput. Graph. Forum.

[31]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[32]  Björn Stenger,et al.  A Single Camera Motion Capture System for Human-Computer Interaction , 2008, IEICE Trans. Inf. Syst..

[33]  Hans-Peter Seidel,et al.  Markerless motion capture of man-machine interaction , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Reinhard Koch,et al.  Single View Motion Tracking by Depth and Silhouette Information , 2007, SCIA.

[35]  Amit Bleiweiss,et al.  Markerless motion capture using a single depth sensor , 2009, SIGGRAPH ASIA '09.

[36]  Nassir Navab,et al.  Manifold Learning for ToF-based Human Body Tracking and Activity Recognition , 2010, BMVC.

[37]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Danica Kragic,et al.  Hands in action: real-time 3D reconstruction of hands in interaction with objects , 2010, 2010 IEEE International Conference on Robotics and Automation.

[39]  Raquel Urtasun,et al.  Combining discriminative and generative methods for 3D deformable surface and articulated pose reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Hans-Peter Seidel,et al.  Stabilizing motion tracking using retrieved motion priors , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Ruigang Yang,et al.  Accurate 3D pose estimation from a single depth image , 2011, 2011 International Conference on Computer Vision.

[42]  Michael J. Black,et al.  Detailed Human Shape and Pose from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Stefano Soatto,et al.  Relevant Feature Selection for Human Pose Estimation and Localization in Cluttered Images , 2008, ECCV.

[44]  Kenny Erleben,et al.  GPU Accelerated Likelihoods for Stereo-Based Articulated Tracking , 2010, ECCV Workshops.

[45]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[46]  Rómer Rosales,et al.  Inferring body pose without tracking body parts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[47]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..

[48]  Reinhard Koch,et al.  Time-of-Flight Sensors in Computer Graphics , 2009, Eurographics.

[49]  David J. Fleet,et al.  Physics-Based Person Tracking Using the Anthropomorphic Walker , 2010, International Journal of Computer Vision.

[50]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[51]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[52]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, CVPR.

[53]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[55]  Hans-Peter Seidel,et al.  Multilinear pose and body shape estimation of dressed subjects from image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[56]  Luc Van Gool,et al.  Functional categorization of objects using real-time markerless motion capture , 2011, CVPR 2011.