Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera

In this paper we present a novel real-time algorithm for simultaneous pose and shape estimation for articulated objects, such as human beings and animals. The key of our pose estimation component is to embed the articulated deformation model with exponential-maps-based parametrization into a Gaussian Mixture Model. Benefiting from this probabilistic measurement model, our algorithm requires no explicit point correspondences as opposed to most existing methods. Consequently, our approach is less sensitive to local minimum and handles fast and complex motions well. Moreover, our novel shape adaptation algorithm based on the same probabilistic model automatically captures the shape of the subjects during the dynamic pose estimation process. The personalized shape model in turn improves the tracking accuracy. Furthermore, we propose novel approaches to use either a mesh model or a sphere-set model as the template for both pose and shape estimation under this unified framework. Extensive evaluations on publicly available data sets demonstrate that our method outperforms most state-of-the-art pose estimation algorithms with large margin, especially in the case of challenging motions. Furthermore, our shape estimation method achieves comparable accuracy with state of the arts, yet requires neither statistical shape model nor extra calibration procedure. Our algorithm is not only accurate but also fast, we have implemented the entire processing pipeline on GPU. It can achieve up to 60 frames per second on a middle-range graphics card.

[1]  Qionghai Dai,et al.  Performance Capture of Interacting Characters with Handheld Kinects , 2012, ECCV.

[2]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[3]  Bodo Rosenhahn,et al.  Model-Based Pose Estimation , 2011, Visual Analysis of Humans.

[4]  Didier Stricker,et al.  KinectAvatar: Fully Automatic Body Capture Using a Single Kinect , 2012, ACCV Workshops.

[5]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jonathan T. Barron,et al.  3D self-portraits , 2013, ACM Trans. Graph..

[7]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[8]  Horst Bischof,et al.  Simultaneous Shape and Pose Adaption of Articulated Models Using Linear Optimization , 2012, ECCV.

[9]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[11]  Ming Zeng,et al.  Templateless Quasi-rigid Shape Modeling with Implicit Loop-Closure , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[13]  Sebastian Thrun,et al.  Real time motion capture using a single time-of-flight camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Pieter Peers,et al.  Temporally coherent completion of dynamic shapes , 2012, TOGS.

[15]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Hans-Peter Seidel,et al.  A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[19]  Michael J. Black,et al.  Detailed Full-Body Reconstructions of Moving People from Monocular RGB-D Sequences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Pascal Fua,et al.  Articulated Soft Objects for Multiview Shape and Motion Capture , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Peter V. Gehler,et al.  Strong Appearance and Expressive Spatial Models for Human Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Andriy Myronenko,et al.  Point Set Registration: Coherent Point Drift , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Ruigang Yang,et al.  Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[25]  Jinxiang Chai,et al.  Accurate realtime full-body motion capture using a single depth camera , 2012, ACM Trans. Graph..

[26]  Michael J. Black,et al.  Home 3D body scans from noisy image and range data , 2011, 2011 International Conference on Computer Vision.

[27]  Matthias Zwicker,et al.  Global registration of dynamic range scans for articulated model reconstruction , 2011, TOGS.

[28]  Sebastian Thrun,et al.  Real-Time Human Pose Tracking from Range Data , 2012, ECCV.

[29]  Gareth Bradshaw Bounding volume hierarchies for level-of-detail collision handling , 2002 .

[30]  Sebastian Thrun,et al.  Real-time identification and localization of body parts from depth images , 2010, 2010 IEEE International Conference on Robotics and Automation.

[31]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Luc Van Gool,et al.  Functional categorization of objects using real-time markerless motion capture , 2011, CVPR 2011.

[33]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[34]  Horst Bischof,et al.  Rapid Skin: Estimating the 3D Human Pose and Shape in Real-Time , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[35]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[36]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[37]  Ruigang Yang,et al.  Accurate 3D pose estimation from a single depth image , 2011, 2011 International Conference on Computer Vision.

[38]  Hans-Peter Seidel,et al.  Personalization and Evaluation of a Real-Time Depth-Based Full Body Tracker , 2013, 2013 International Conference on 3D Vision.

[39]  Didier Stricker,et al.  Algorithms for 3D Shape Scanning with a Depth Camera , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Luca Ballan,et al.  Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes , 2008 .

[42]  Szymon Rusinkiewicz,et al.  Global non-rigid alignment of 3-D scans , 2007, SIGGRAPH 2007.

[43]  Hans-Peter Seidel,et al.  Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[44]  Marc Pollefeys,et al.  Photometric Bundle Adjustment for Dense Multi-view 3D Modeling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Andrew W. Fitzgibbon,et al.  Metric Regression Forests for Human Pose Estimation , 2013, BMVC.

[46]  Antti Oulasvirta,et al.  Fast and robust hand tracking using detection-guided optimization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Jitendra Malik,et al.  Twist Based Acquisition and Tracking of Animal and Human Kinematics , 2004, International Journal of Computer Vision.