Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies

We present a unified deformation model for the markerless capture of human movement at multiple scales, including facial expressions, body motion, and hand gestures. An initial model is generated by locally stitching together models of the individual parts of the human body, which we refer to as "Frank". This model enables the full expression of part movements, including face and hands, by a single seamless model. We capture a dataset of people wearing everyday clothes and optimize the Frank model to create "Adam": a calibrated model that shares the same skeleton hierarchy as the initial model with a simpler parameterization. Finally, we demonstrate the use of these models for total motion tracking in a multiview setup, simultaneously capturing the large-scale body movements and the subtle face and hand motion of a social group of people.

[1]  Jihun Yu,et al.  Realtime facial animation with on-the-fly correctives , 2013, ACM Trans. Graph..

[2]  Michael J. Black,et al.  Dyna: a model of dynamic human shape in motion , 2015, ACM Trans. Graph..

[3]  Takeo Kanade,et al.  Panoptic Studio: A Massively Multiview System for Social Motion Capture , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Li Cheng,et al.  Efficient Hand Pose Estimation from a Single Depth Image , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Luc Van Gool,et al.  Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[7]  Christian Theobalt,et al.  Reconstructing detailed dynamic face geometry from monocular video , 2013, ACM Trans. Graph..

[8]  Michael J. Black,et al.  Detailed, Accurate, Human Shape Estimation from Clothed 3D Scan Sequences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Takeo Kanade,et al.  Shape-From-Silhouette Across Time Part I: Theory and Algorithms , 2005, International Journal of Computer Vision.

[10]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[11]  Jean Ponce,et al.  Dense 3D motion capture from synchronized video streams , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Thabo Beeler,et al.  High-quality single-shot capture of facial geometry , 2010, ACM Trans. Graph..

[13]  Antti Oulasvirta,et al.  Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Michael J. Black,et al.  ClothCap , 2017, ACM Trans. Graph..

[15]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jovan Popović,et al.  Dynamic shape capture using multi-view photometric stereo , 2009, SIGGRAPH 2009.

[18]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[19]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[20]  W. Heidrich,et al.  High resolution passive facial performance capture , 2010, ACM Trans. Graph..

[21]  Emiliano Gambaretto,et al.  Markerless Motion Capture through Visual Hull, Articulated ICP and Subject Specific Model Generation , 2010, International Journal of Computer Vision.

[22]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[23]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[24]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[26]  H. Woltring,et al.  New possibilities for human motion studies by real-time light spot position measurement. , 1974, Biotelemetry.

[27]  Wei Zhang,et al.  Deep Kinematic Pose Regression , 2016, ECCV Workshops.

[28]  Woltring Hj,et al.  New possibilities for human motion studies by real-time light spot position measurement. , 1974 .

[29]  Marc Pollefeys,et al.  Capturing Hands in Action Using Discriminative Salient Points and Physics Simulation , 2015, International Journal of Computer Vision.

[30]  Luc Van Gool,et al.  Direction matters: hand pose estimation from local surface normals , 2016, ArXiv.

[31]  Hans-Peter Seidel,et al.  Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[32]  Jinxiang Chai,et al.  Combining marker-based mocap and RGB-D camera for acquiring high-fidelity hand motion data , 2012, SCA '12.

[33]  Jian Sun,et al.  Cascaded hand pose regression , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Dimitrios Tzionas,et al.  Embodied hands , 2017, ACM Trans. Graph..

[35]  Paul E. Debevec,et al.  Multiview face capture using polarized spherical gradient illumination , 2011, ACM Trans. Graph..

[36]  Hans-Peter Seidel,et al.  Markerless Motion Capture of Multiple Characters Using Multiview Image Segmentation , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Thabo Beeler,et al.  Real-time high-fidelity facial performance capture , 2015, ACM Trans. Graph..

[38]  Yaser Sheikh,et al.  MAP Visibility Estimation for Large-Scale Dynamic 3D Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[40]  Antti Oulasvirta,et al.  Fast and robust hand tracking using detection-guided optimization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Xiaowei Zhou,et al.  3D Shape Reconstruction from 2D Landmarks: A Convex Formulation , 2014, ArXiv.

[42]  Takeo Kanade,et al.  Panoptic Studio: A Massively Multiview System for Social Interaction Capture , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[44]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[45]  Larry S. Davis,et al.  Tracking of humans in action: a 3-D model-based approach , 1996 .

[46]  Bodo Rosenhahn,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Combined Region-and Motion-based 3d Tracking of Rigid and Articulated Objects , 2022 .

[47]  Antonis A. Argyros,et al.  Tracking the articulated motion of two strongly interacting hands , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Derek Bradley,et al.  An anatomically-constrained local deformation model for monocular face capture , 2016, ACM Trans. Graph..

[49]  Jitendra Malik,et al.  Twist Based Acquisition and Tracking of Animal and Human Kinematics , 2004, International Journal of Computer Vision.

[50]  Derek Bradley,et al.  High-quality passive facial performance capture using anchor frames , 2011, ACM Trans. Graph..

[51]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Hans-Peter Seidel,et al.  Lightweight binocular facial performance capture under uncontrolled lighting , 2012, ACM Trans. Graph..

[53]  Jessica K. Hodgins,et al.  Capturing and animating skin deformation in human motion , 2006, SIGGRAPH '06.

[54]  Andrew W. Fitzgibbon,et al.  Accurate, Robust, and Flexible Real-time Hand Tracking , 2015, CHI.

[55]  Lourdes Agapito,et al.  Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Jonathan Tompson,et al.  Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[58]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[59]  Luc Van Gool,et al.  Markerless tracking of complex human motions from multiple views , 2006, Comput. Vis. Image Underst..

[60]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[61]  Qi Ye,et al.  Spatial Attention Deep Net with Partial PSO for Hierarchical Hybrid Hand Pose Estimation , 2016, ECCV.

[62]  R. Birdwhistell Kinesics and Context: Essays on Body Motion Communication , 1971 .

[63]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.