HybridFusion: Real-Time Performance Capture Using a Single Depth Sensor and Sparse IMUs

We propose a light-weight yet highly robust method for real-time human performance capture based on a single depth camera and sparse inertial measurement units (IMUs). Our method combines non-rigid surface tracking and volumetric fusion to simultaneously reconstruct challenging motions, detailed geometries and the inner human body of a clothed subject. The proposed hybrid motion tracking algorithm and efficient per-frame sensor calibration technique enable non-rigid surface reconstruction for fast motions and challenging poses with severe occlusions. Significant fusion artifacts are reduced using a new confidence measurement for our adaptive TSDF-based fusion. The above contributions are mutually beneficial in our reconstruction system, which enable practical human performance capture that is real-time, robust, low-cost and easy to deploy. Experiments show that extremely challenging performances and loop closure problems can be handled successfully.

[1]  Hans-Peter Seidel,et al.  Animation cartography—intrinsic reconstruction of shape and motion , 2012, TOGS.

[2]  Shahram Izadi,et al.  Motion2fusion , 2017, ACM Trans. Graph..

[3]  Qionghai Dai,et al.  Robust Non-rigid Motion Tracking and Surface Reconstruction Using L0 Regularization , 2015, ICCV.

[4]  Wojciech Matusik,et al.  Practical motion capture in everyday surroundings , 2007, SIGGRAPH 2007.

[5]  Leonidas J. Guibas,et al.  Robust single-view geometry and motion reconstruction , 2009, ACM Trans. Graph..

[6]  Hans-Peter Seidel,et al.  Motion reconstruction using sparse accelerometer data , 2011, TOGS.

[7]  Danping Zou,et al.  CoSLAM: Collaborative Visual SLAM in Dynamic Environments , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jean-Yves Guillemaut,et al.  General Dynamic Scene Reconstruction from Multiple View Video , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Daniel Cremers,et al.  KillingFusion: Non-rigid 3D Reconstruction without Correspondences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Dimitris Samaras,et al.  EyeOpener: Editing Eyes in the Wild , 2017, ACM Trans. Graph..

[11]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[12]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[13]  Pushmeet Kohli,et al.  Fusion4D , 2016, ACM Trans. Graph..

[14]  Leonidas J. Guibas,et al.  Dynamic geometry registration , 2007, Symposium on Geometry Processing.

[15]  Edmond Boyer,et al.  Multi-view Dynamic Shape Refinement Using Local Temporal Integration , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Craig Gotsman,et al.  Articulated Object Reconstruction and Markerless Motion Capture from Depth Video , 2008, Comput. Graph. Forum.

[17]  Yaser Sheikh,et al.  Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Christian Theobalt,et al.  MonoPerfCap , 2017, ACM Trans. Graph..

[19]  Bodo Rosenhahn,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Combined Region-and Motion-based 3d Tracking of Rigid and Articulated Objects , 2022 .

[20]  Günther Greiner,et al.  Reconstructing Animated Meshes from Time‐Varying Point Clouds , 2008, Comput. Graph. Forum.

[21]  Michael J. Black,et al.  ClothCap , 2017, ACM Trans. Graph..

[22]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[23]  Matthias Nießner,et al.  VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction , 2016, ECCV.

[24]  Slobodan Ilic,et al.  SobolevFusion: 3D Reconstruction of Scenes Undergoing Free Non-rigid Motion , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Hans-Peter Seidel,et al.  Real-Time Body Tracking with One Depth Camera and Inertial Sensors , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  Qionghai Dai,et al.  DoubleFusion: Real-Time Capture of Human Performances with Inner Body Shapes from a Single Depth Sensor , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Tao Yu,et al.  Real-time geometry, albedo and motion reconstruction using a single RGBD camera , 2017, TOGS.

[28]  Jessica K. Hodgins,et al.  Action capture with accelerometers , 2008, SCA '08.

[29]  Hans-Peter Seidel,et al.  Outdoor human motion capture using inverse kinematics and von mises-fisher sampling , 2011, 2011 International Conference on Computer Vision.

[30]  Bodo Rosenhahn,et al.  Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs , 2017, Comput. Graph. Forum.

[31]  Bodo Rosenhahn,et al.  Analyzing and Evaluating Markerless Motion Tracking Using Inertial Sensors , 2010, ECCV Workshops.

[32]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Tao Yu,et al.  BodyFusion: Real-Time Capture of Human Motion and Surface Geometry Using a Single Depth Camera , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Hans-Peter Seidel,et al.  Markerless Motion Capture of Multiple Characters Using Multiview Image Segmentation , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[36]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[38]  Huamin Wang,et al.  Modeling deformable objects from a single depth camera , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[39]  Andrew W. Fitzgibbon,et al.  Online generative model personalization for hand tracking , 2017, ACM Trans. Graph..

[40]  Derek Bradley,et al.  Markerless garment capture , 2008, SIGGRAPH 2008.

[41]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH 2005.

[42]  Andrew W. Fitzgibbon,et al.  Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[43]  Bodo Rosenhahn,et al.  Multisensor-fusion for 3D full-body human motion capture , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  J. Collomosse,et al.  Real-Time Full-Body Motion Capture from Video and IMUs , 2017, 2017 International Conference on 3D Vision (3DV).

[45]  Michael J. Black,et al.  Dyna: a model of dynamic human shape in motion , 2015, ACM Trans. Graph..

[46]  Qionghai Dai,et al.  Performance Capture of Interacting Characters with Handheld Kinects , 2012, ECCV.

[47]  M. Pauly,et al.  Embedded deformation for shape manipulation , 2007, SIGGRAPH 2007.

[48]  David A. Forsyth,et al.  Generalizing motion edits with Gaussian processes , 2009, ACM Trans. Graph..