User-Specific Hand Modeling from Monocular Depth Sequences

This paper presents a method for acquiring dense nonrigid shape and deformation from a single monocular depth sensor. We focus on modeling the human hand, and assume that a single rough template model is available. We combine and extend existing work on model-based tracking, subdivision surface fitting, and mesh deformation to acquire detailed hand models from as few as 15 frames of depth data. We propose an objective that measures the error of fit between each sampled data point and a continuous model surface defined by a rigged control mesh, and uses as-rigid-as-possible (ARAP) regularizers to cleanly separate the model and template geometries. A key contribution is our use of a smooth model based on subdivision surfaces that allows simultaneous optimization over both correspondences and model parameters. This avoids the use of iterated closest point (ICP) algorithms which often lead to slow convergence. Automatic initialization is obtained using a regression forest trained to infer approximate correspondences. Experiments show that the resulting meshes model the user's hand shape more accurately than just adapting the shape parameters of the skeleton, and that the retargeted skeleton accurately models the user's articulations. We investigate the effect of various modeling choices, and show the benefits of using subdivision surfaces and ARAP regularization.

[1]  Charles T. Loop,et al.  Smooth Subdivision Surfaces Based on Triangles , 1987 .

[2]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[3]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[4]  Zoran Popovic,et al.  The space of human body shapes: reconstruction and parameterization from range scans , 2003, ACM Trans. Graph..

[5]  Christian Rössl,et al.  Laplacian surface editing , 2004, SGP '04.

[6]  Denis Zorin,et al.  Differentiable parameterization of Catmull-Clark subdivision surfaces , 2004, SGP '04.

[7]  Takeo Igarashi,et al.  As-rigid-as-possible shape manipulation , 2005, SIGGRAPH '05.

[8]  Dragomir Anguelov,et al.  SCAPE: shape completion and animation of people , 2005, ACM Trans. Graph..

[9]  Christian Rössl,et al.  Eurographics Symposium on Point-based Graphics (2006) Template Deformation for Point Cloud Fitting , 2022 .

[10]  John P. Lewis,et al.  Human hand modeling from surface anatomy , 2006, I3D '06.

[11]  Marc Alexa,et al.  As-rigid-as-possible surface modeling , 2007, Symposium on Geometry Processing.

[12]  M. Pauly,et al.  Embedded deformation for shape manipulation , 2007, SIGGRAPH 2007.

[13]  Charles T. Loop,et al.  Approximating Catmull-Clark subdivision surfaces with bicubic patches , 2008, TOGS.

[14]  David J. Fleet,et al.  Model-based hand tracking with texture, shading and self-occlusions , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Hao Li,et al.  Global Correspondence Optimization for Non‐Rigid Registration of Depth Scans , 2008, Comput. Graph. Forum.

[16]  Hans-Peter Seidel,et al.  Efficient reconstruction of nonrigid shape and motion from real-time 3D scanner data , 2009, TOGS.

[17]  Hans-Peter Seidel,et al.  A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[18]  Huamin Wang,et al.  Modeling deformable objects from a single depth camera , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Leonidas J. Guibas,et al.  Robust single-view geometry and motion reconstruction , 2009, ACM Trans. Graph..

[20]  Jos Stam,et al.  Evaluation of Loop Subdivision Surfaces , 2010 .

[21]  Olga Sorkine-Hornung,et al.  Stretchable and Twistable Bones for Skeletal Shape Deformation , 2011, ACM Trans. Graph..

[22]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[23]  Michael J. Black,et al.  Home 3D body scans from noisy image and range data , 2011, 2011 International Conference on Computer Vision.

[24]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[25]  Michael J. Black,et al.  Coregistration: Simultaneous Alignment and Modeling of Articulated 3D Shape , 2012, ECCV.

[26]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[27]  Horst Bischof,et al.  Simultaneous Shape and Pose Adaption of Articulated Models Using Linear Optimization , 2012, ECCV.

[28]  Andrew W. Fitzgibbon,et al.  The Vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Luc Van Gool,et al.  Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[30]  Andrew W. Fitzgibbon,et al.  What Shape Are Dolphins? Building 3D Morphable Models from 2D Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Luke S. Zettlemoyer,et al.  3D Wikipedia , 2013, ACM Trans. Graph..

[32]  Jonathan T. Barron,et al.  3D self-portraits , 2013, ACM Trans. Graph..

[33]  Hans-Peter Seidel,et al.  Personalization and Evaluation of a Real-Time Depth-Based Full Body Tracker , 2013, 2013 International Conference on 3D Vision.