Dynamic FAUST: Registering Human Bodies in Motion

While the ready availability of 3D scan data has influenced research throughout computer vision, less attention has focused on 4D data, that is 3D scans of moving non-rigid objects, captured over time. To be useful for vision research, such 4D scans need to be registered, or aligned, to a common topology. Consequently, extending mesh registration methods to 4D is important. Unfortunately, no ground-truth datasets are available for quantitative evaluation and comparison of 4D registration methods. To address this we create a novel dataset of high-resolution 4D scans of human subjects in motion, captured at 60 fps. We propose a new mesh registration method that uses both 3D geometry and texture information to register all scans in a sequence to a common reference topology. The approach exploits consistency in texture over both short and long time intervals and deals with temporal offsets between shape and texture capture. We show how using geometry alone results in significant errors in alignment when the motions are fast and non-rigid. We evaluate the accuracy of our registration and provide a dataset of 40,000 raw and aligned meshes. Dynamic FAUST extends the popular FAUST dataset to dynamic 4D data, and is available for research purposes at http://dfaust.is.tue.mpg.de.

[1]  Michael J. Black,et al.  Detailed Full-Body Reconstructions of Moving People from Monocular RGB-D Sequences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Martin Rumpf,et al.  Shell PCA: Statistical Shape Modelling in Shell Space , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Qi-Xing Huang,et al.  Dense Human Body Correspondences Using Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Vagia Tsiminaki,et al.  Eigen Appearance Maps of Dynamic Shapes , 2016, ECCV.

[5]  Pierre Vandergheynst,et al.  Learning class‐specific descriptors for deformable shapes using localized spectral convolutional networks , 2015, SGP '15.

[6]  Hans-Peter Seidel,et al.  Drift-free tracking of rigid and articulated objects , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Daniel Cremers,et al.  Efficient Globally Optimal 2D-to-3D Deformable Shape Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Michael J. Black,et al.  Coregistration: Simultaneous Alignment and Modeling of Articulated 3D Shape , 2012, ECCV.

[9]  Leonidas J. Guibas,et al.  SHREC 2010: robust correspondence benchmark , 2010 .

[10]  Hans-Peter Seidel,et al.  Enhancing silhouette-based human motion capture with 3D motion fields , 2003, 11th Pacific Conference onComputer Graphics and Applications, 2003. Proceedings..

[11]  Andrew W. Fitzgibbon,et al.  Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[12]  Andrew W. Fitzgibbon,et al.  Metric Regression Forests for Correspondence Estimation , 2015, International Journal of Computer Vision.

[13]  Slobodan Ilic,et al.  Probabilistic Deformable Surface Tracking from Multiple Videos , 2010, ECCV.

[14]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[15]  Adrian Hilton,et al.  Human motion synthesis from 3D video , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Daniel Cremers,et al.  Matching of Deformable Shapes with Topological Noise , 2016, 3DOR@Eurographics.

[17]  Davide Eynard,et al.  Shape‐from‐Operator: Recovering Shapes from Intrinsic Operators , 2015, Comput. Graph. Forum.

[18]  Martin Klaudiny,et al.  Global Non-rigid Alignment of Surface Sequences , 2013, International Journal of Computer Vision.

[19]  Alvaro Collet,et al.  Motion graphs for unstructured textured meshes , 2016, ACM Trans. Graph..

[20]  Vagia Tsiminaki,et al.  High Resolution 3D Shape Texture from Multiple Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[22]  Jonathan Masci,et al.  Learning shape correspondence with anisotropic convolutional neural networks , 2016, NIPS.

[23]  Gérard G. Medioni,et al.  Capturing Dynamic Textured Surfaces of Moving Targets , 2016, ECCV.

[24]  Andrew W. Fitzgibbon,et al.  3D scanning deformable objects with a single RGBD sensor , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Alexander M. Bronstein,et al.  Numerical Geometry of Non-Rigid Shapes , 2009, Monographs in Computer Science.

[26]  Michael J. Black,et al.  The stitched puppet: A graphical model of 3D human shape and pose , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[28]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[29]  Edmond Boyer,et al.  An efficient volumetric framework for shape tracking , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Vladlen Koltun,et al.  Robust Nonrigid Registration by Convex Optimization , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH 2005.

[32]  Takashi Matsuyama,et al.  Dynamic surface matching by geodesic mapping for 3D animation transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Derek Bradley,et al.  High-quality passive facial performance capture using anchor frames , 2011, ACM Trans. Graph..

[34]  Michael J. Black,et al.  FAUST: Dataset and Evaluation for 3D Mesh Registration , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Hans-Peter Seidel,et al.  Spatio-temporal motion tracking with unsynchronized cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Alvaro Collet,et al.  High-quality streamable free-viewpoint video , 2015, ACM Trans. Graph..

[37]  Leonidas J. Guibas,et al.  Robust single-view geometry and motion reconstruction , 2009, ACM Trans. Graph..

[38]  Adrian Hilton,et al.  Global temporal registration of multiple non-rigid surface sequences , 2011, CVPR 2011.

[39]  Thabo Beeler,et al.  Real-time high-fidelity facial performance capture , 2015, ACM Trans. Graph..

[40]  Adrian Hilton,et al.  Surface Capture for Performance-Based Animation , 2007, IEEE Computer Graphics and Applications.

[41]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[42]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[44]  Hao Li,et al.  Real-Time Facial Segmentation and Performance Capture from RGB Input , 2016, ECCV.

[45]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[46]  Adrian Hilton,et al.  Model-based multiple view reconstruction of people , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[47]  Jean-Yves Guillemaut,et al.  4D parametric motion graphs for interactive animation , 2012, I3D '12.

[48]  Adrian Hilton,et al.  4D Model Flow: Precomputed Appearance Alignment for Real‐time 4D Video Interpolation , 2015, Comput. Graph. Forum.

[49]  Michael J. Black,et al.  Dyna: a model of dynamic human shape in motion , 2015, ACM Trans. Graph..

[50]  Kathleen M. Robinette,et al.  The CAESAR project: a 3-D surface anthropometry survey , 1999, Second International Conference on 3-D Digital Imaging and Modeling (Cat. No.PR00062).

[51]  Hans-Peter Seidel,et al.  A Statistical Model of Human Pose and Body Shape , 2009, Comput. Graph. Forum.

[52]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Christian Rössl,et al.  Dense correspondence finding for parametrization-free animation reconstruction from video , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.