Hybrid Modeling of Non-Rigid Scenes From RGBD Cameras

Recent advances in sensor technology have introduced low-cost RGB video plus depth sensors, such as the Kinect, which enable simultaneous acquisition of color and depth images at video rates. This paper introduces a framework for representation of general dynamic scenes from video plus depth acquisition. A hybrid representation is proposed which combines the advantages of prior surfel graph surface segmentation and modeling work with the higher resolution surface reconstruction capability of volumetric fusion techniques. The contributions are: 1) extension of a prior piecewise surfel graph modeling approach for improved accuracy and completeness; 2) combination of this surfel graph modeling with a truncated signed distance function surface fusion to generate dense geometry; and 3) proposal of means for validation of the reconstructed a 4D scene model against the input data and efficient storage of any unmodeled regions via residual depth maps. The approach allows arbitrary dynamic scenes to be efficiently represented with a temporally consistent structure and enhanced levels of detail and completeness where possible, but gracefully falls back to raw measurements where no structure can be inferred. The representation is shown to facilitate creative manipulation of real scene data which would previously require more complex capture set-ups or manual processing.

[1]  Marsette Vona,et al.  Moving Volume KinectFusion , 2012, BMVC.

[2]  Rui Yu,et al.  Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes , 2014, ECCV.

[3]  Jiawen Chen,et al.  Scalable real-time volumetric surface reconstruction , 2013, ACM Trans. Graph..

[4]  Qionghai Dai,et al.  Robust Non-rigid Motion Tracking and Surface Reconstruction Using L0 Regularization , 2015, ICCV.

[5]  Pichao Wang,et al.  Scene Flow to Action Map: A New Representation for RGB-D Based Action Recognition with Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Adrian Hilton,et al.  Surface Capture for Performance-Based Animation , 2007, IEEE Computer Graphics and Applications.

[8]  Pan Ji Maximizing Rigidity ” Revisited : a Convex Programming Approach for Generic 3 D Shape Reconstruction from Multiple Perspective Views , 2018 .

[9]  Horst Bischof,et al.  CP-Census: A Novel Model for Dense Variational Scene Flow from RGB-D Data , 2014, BMVC.

[10]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Tao Yu,et al.  Real-time geometry, albedo and motion reconstruction using a single RGBD camera , 2017, TOGS.

[12]  Pushmeet Kohli,et al.  Fusion4D , 2016, ACM Trans. Graph..

[13]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[14]  D. Shepard A two-dimensional interpolation function for irregularly-spaced data , 1968, ACM National Conference.

[15]  Adrian Hilton,et al.  Learning Part-Based Models for Animation from Surface Motion Capture , 2013, 2013 International Conference on 3D Vision.

[16]  Matthias Nießner,et al.  VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction , 2016, ECCV.

[17]  Martin Klaudiny,et al.  Single-View RGBD-Based Reconstruction of Dynamic Human Geometry , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[18]  Anton Osokin,et al.  Fast Approximate Energy Minimization with Label Costs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  L. Montefusco,et al.  Radial basis functions for the multivariate interpolation of large scattered data sets , 2002 .

[20]  Martin Klaudiny,et al.  Structured Representation of Non-Rigid Surfaces from Single View 3D Point Tracks , 2014, 2014 2nd International Conference on 3D Vision.

[21]  Hongdong Li,et al.  “Maximizing Rigidity” Revisited: A Convex Programming Approach for Generic 3D Shape Reconstruction from Multiple Perspective Views , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Sander Oude Elberink,et al.  Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications , 2012, Sensors.

[23]  John J. Leonard,et al.  Kintinuous: Spatially Extended KinectFusion , 2012, AAAI 2012.

[24]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[25]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[26]  Matteo Matteucci,et al.  Real-Time CPU-Based Large-Scale Three-Dimensional Mesh Reconstruction , 2018, IEEE Robotics and Automation Letters.

[27]  Andreas Uhl,et al.  BlenSor: Blender Sensor Simulation Toolbox , 2011, ISVC.

[28]  Daniel Cremers,et al.  A primal-dual framework for real-time dense RGB-D scene flow , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Xiaoyang Liu,et al.  Real-Time Geometry, Albedo, and Motion Reconstruction Using a Single RGB-D Camera , 2017, ACM Trans. Graph..

[30]  Andrew W. Fitzgibbon,et al.  Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[31]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[32]  Tim Weyrich,et al.  Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion , 2013, 2013 International Conference on 3D Vision.

[33]  Jongmoo Choi,et al.  Near laser-scan quality 3-D face reconstruction from a low-quality depth stream , 2015, Image Vis. Comput..

[34]  Adrian Hilton,et al.  Implicit Surface-Based Geometric Fusion , 1998, Comput. Vis. Image Underst..

[35]  Hans-Peter Seidel,et al.  Animation cartography—intrinsic reconstruction of shape and motion , 2012, TOGS.

[36]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[37]  Hans-Peter Seidel,et al.  Coherent Spatiotemporal Filtering, Upsampling and Rendering of RGBZ Videos , 2012, Comput. Graph. Forum.

[38]  Kurt Keutzer,et al.  Dense Point Trajectories by GPU-Accelerated Large Displacement Optical Flow , 2010, ECCV.

[39]  Vladlen Koltun,et al.  Dense Monocular Depth Estimation in Complex Dynamic Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Ruigang Yang,et al.  Automatic Natural Video Matting with Depth , 2007, 15th Pacific Conference on Computer Graphics and Applications (PG'07).

[41]  Slobodan Ilic,et al.  Free-form mesh tracking: A patch-based approach , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Adrian Hilton,et al.  Hierarchical Shape Matching for Temporally Consistent 3D Video , 2011, 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission.

[43]  Marc Levoy,et al.  Zippered polygon meshes from range images , 1994, SIGGRAPH.

[44]  Qiang Wu,et al.  Completed Dense Scene Flow in RGB-D Space , 2014, ACCV Workshops.

[45]  Daniel Cremers,et al.  KillingFusion: Non-rigid 3D Reconstruction without Correspondences , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Andrew W. Fitzgibbon,et al.  SphereFlow: 6 DoF Scene Flow from RGB-D Pairs , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Hongdong Li,et al.  Monocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two Perspective Frames , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  Lourdes Agapito,et al.  Energy based multiple model fitting for non-rigid structure from motion , 2011, CVPR 2011.