High-Quality Textured 3D Shape Reconstruction with Cascaded Fully Convolutional Networks

We present a learning-based approach to reconstructing high-resolution three-dimensional (3D) shapes with detailed geometry and high-fidelity textures. Albeit extensively studied, algorithms for 3D reconstruction from multi-view depth-and-color (RGB-D) scans are still prone to measurement noise and occlusions; limited scanning or capturing angles also often lead to incomplete reconstructions. Propelled by recent advances in 3D deep learning techniques, in this paper, we introduce a novel computation- and memory-efficient cascaded 3D convolutional network architecture, which learns to reconstruct implicit surface representations as well as the corresponding color information from noisy and imperfect RGB-D maps. The proposed 3D neural network performs reconstruction in a progressive and coarse-to-fine manner, achieving unprecedented output resolution and fidelity. Meanwhile, an algorithm for end-to-end training of the proposed cascaded structure is developed. We further introduce Human10, a newly created dataset containing both detailed and textured full-body reconstructions as well as corresponding raw RGB-D scans of 10 subjects. Qualitative and quantitative experimental results on both synthetic and real-world datasets demonstrate that the presented approach outperforms existing state-of-the-art work regarding visual quality and accuracy of reconstructed models.

[1]  Matthias Nießner,et al.  BundleFusion , 2016, TOGS.

[2]  Pierre Alliez,et al.  Eurographics Symposium on Geometry Processing (2007) Voronoi-based Variational Reconstruction of Unoriented Point Sets , 2022 .

[3]  Jitendra Malik,et al.  Hierarchical Surface Prediction for 3D Object Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[4]  Karthik Ramani,et al.  SurfNet: Generating 3D Shape Surfaces Using Deep Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[6]  Olaf Kähler,et al.  Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure , 2016, ECCV.

[7]  Horst Bischof,et al.  OctNetFusion: Learning Depth Fusion from Data , 2017, 2017 International Conference on 3D Vision (3DV).

[8]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[9]  Alvaro Collet,et al.  High-quality streamable free-viewpoint video , 2015, ACM Trans. Graph..

[10]  Vladlen Koltun,et al.  Robust reconstruction of indoor scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Andrew W. Fitzgibbon,et al.  Image-Based Rendering Using Image-Based Priors , 2005, International Journal of Computer Vision.

[12]  Stefan Leutenegger,et al.  ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[13]  Daniel Cohen-Or,et al.  Seamless Montage for Texturing Models , 2010, Comput. Graph. Forum.

[14]  Zhen Li,et al.  High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Jan-Michael Frahm,et al.  Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[16]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[17]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[18]  George Drettakis,et al.  Scalable inside-out image-based rendering , 2016, ACM Trans. Graph..

[19]  Jean-Philippe Pons,et al.  Robust piecewise-planar 3D reconstruction and completion from large-scale unstructured point data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[21]  Shi-Min Hu,et al.  3D indoor scene modeling from RGB-D data: a survey , 2015, Computational Visual Media.

[22]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[23]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[24]  Bruno Lévy,et al.  Least squares conformal maps for automatic texture atlas generation , 2002, ACM Trans. Graph..

[25]  Steven M. Seitz,et al.  LookinGood , 2018, ACM Trans. Graph..

[26]  Simon Fuhrmann,et al.  Fusion of depth maps with multiple scales , 2011, ACM Trans. Graph..

[27]  Chao Yang,et al.  Shape Inpainting Using 3D Generative Adversarial Network and Recurrent Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Yang Liu,et al.  O-CNN , 2017, ACM Trans. Graph..

[29]  Charles T. Loop,et al.  Holoportation: Virtual 3D Teleportation in Real-time , 2016, UIST.

[30]  Jitendra Malik,et al.  Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[32]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[33]  Daniel Cremers,et al.  Robust odometry estimation for RGB-D cameras , 2013, 2013 IEEE International Conference on Robotics and Automation.

[34]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[35]  Horst Bischof,et al.  A Globally Optimal Algorithm for Robust TV-L1 Range Image Integration , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[36]  Thomas Brox,et al.  Multi-view 3D Models from Single Images with a Convolutional Network , 2015, ECCV.

[37]  Anita Sellent,et al.  Floating Textures , 2008, Comput. Graph. Forum.

[38]  Carsten Rother,et al.  PatchMatch Stereo - Stereo Matching with Slanted Support Windows , 2011, BMVC.

[39]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[40]  Chongyang Ma,et al.  Deep Volumetric Video From Very Sparse Multi-view Performance Capture , 2018, ECCV.

[41]  Michael M. Kazhdan,et al.  Screened poisson surface reconstruction , 2013, TOGS.

[42]  Simon J. Julier,et al.  Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Vladlen Koltun,et al.  Color map optimization for 3D reconstruction with consumer depth cameras , 2014, ACM Trans. Graph..

[44]  Daniel Cremers,et al.  Real-time visual odometry from dense RGB-D images , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[45]  Kang Chen,et al.  Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information , 2014, ACM Trans. Graph..

[46]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[47]  Andrew W. Fitzgibbon,et al.  Kinectrack: 3D Pose Estimation Using a Projected Dense Dot Pattern , 2014, IEEE Transactions on Visualization and Computer Graphics.

[48]  Andrew I. Comport,et al.  On unifying key-frame and voxel-based dense visual SLAM at large scales , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[49]  Shi-Min Hu,et al.  Structure recovery by part assembly , 2012, ACM Trans. Graph..

[50]  Gabriel Taubin,et al.  SSD: Smooth Signed Distance Surface Reconstruction , 2011, Comput. Graph. Forum.

[51]  Justus Thies,et al.  IGNOR: Image-guided Neural Object Rendering , 2018, ArXiv.

[52]  Adrian Hilton,et al.  Volumetric performance capture from minimal camera viewpoints , 2018, ECCV.

[53]  M. Gross,et al.  Algebraic point set surfaces , 2007, ACM Trans. Graph..

[54]  Jan-Michael Frahm,et al.  3D Reconstruction Using an n-Layer Heightmap , 2010, DAGM-Symposium.

[55]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Luiz Velho,et al.  Hermite Radial Basis Functions Implicits , 2011, Comput. Graph. Forum.

[57]  Reinhard Klein,et al.  Completion and Reconstruction with Primitive Shapes , 2009, Comput. Graph. Forum.

[58]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Thomas Brox,et al.  3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[60]  Vladlen Koltun,et al.  Fast MRF Optimization with Application to Depth Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Thomas Brox,et al.  Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[62]  Steven M. Seitz,et al.  Occluding Contours for Multi-view Stereo , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[64]  Tim Weyrich,et al.  Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion , 2013, 2013 International Conference on 3D Vision.

[65]  Lu Fang,et al.  SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[66]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[67]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Shahram Izadi,et al.  UltraStereo: Efficient Learning-Based Matching for Active Stereo Systems , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Markus H. Gross,et al.  Feature Preserving Point Set Surfaces based on Non‐Linear Kernel Regression , 2009, Comput. Graph. Forum.

[70]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[71]  Yizhou Yu,et al.  Efficient View-Dependent Image-Based Rendering with Projective Texture-Mapping , 1998, Rendering Techniques.

[72]  Pierre Alliez,et al.  A Survey of Surface Reconstruction from Point Clouds , 2017, Comput. Graph. Forum.

[73]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[74]  Ersin Yumer,et al.  Transformation-Grounded Image Generation Network for Novel 3D View Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Amitabh Varshney,et al.  Montage4D: interactive seamless fusion of multiview video textures , 2018, I3D.

[78]  Kun Zhou,et al.  An interactive approach to semantic modeling of indoor scenes with an RGBD camera , 2012, ACM Trans. Graph..

[79]  Jiajun Wu,et al.  Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[80]  Oliver Grau,et al.  VConv-DAE: Deep Volumetric Shape Learning Without Object Labels , 2016, ECCV Workshops.

[81]  Wolfram Burgard,et al.  OctoMap : A Probabilistic , Flexible , and Compact 3 D Map Representation for Robotic Systems , 2010 .

[82]  Marcus A. Magnor,et al.  Video Based Reconstruction of 3D People Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[83]  Bo Yang,et al.  3D Object Reconstruction from a Single Depth View with Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[84]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[86]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[87]  George Drettakis,et al.  Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[88]  Matthias Nießner,et al.  Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89]  Cordelia Schmid,et al.  BodyNet: Volumetric Inference of 3D Human Body Shapes , 2018, ECCV.

[90]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[91]  Shi-Min Hu,et al.  Learning to Reconstruct High-Quality 3D Shapes with Cascaded Fully Convolutional Networks , 2018, ECCV.

[92]  Richard K. Beatson,et al.  Reconstruction and representation of 3D objects with radial basis functions , 2001, SIGGRAPH.

[93]  Hao Chen,et al.  3D deeply supervised network for automated segmentation of volumetric medical images , 2017, Medical Image Anal..

[94]  Stefan Leutenegger,et al.  ElasticFusion: Real-time dense SLAM and light source estimation , 2016, Int. J. Robotics Res..