论文信息 - High-Quality Textured 3D Shape Reconstruction with Cascaded Fully Convolutional Networks

High-Quality Textured 3D Shape Reconstruction with Cascaded Fully Convolutional Networks

We present a learning-based approach to reconstructing high-resolution three-dimensional (3D) shapes with detailed geometry and high-fidelity textures. Albeit extensively studied, algorithms for 3D reconstruction from multi-view depth-and-color (RGB-D) scans are still prone to measurement noise and occlusions; limited scanning or capturing angles also often lead to incomplete reconstructions. Propelled by recent advances in 3D deep learning techniques, in this paper, we introduce a novel computation- and memory-efficient cascaded 3D convolutional network architecture, which learns to reconstruct implicit surface representations as well as the corresponding color information from noisy and imperfect RGB-D maps. The proposed 3D neural network performs reconstruction in a progressive and coarse-to-fine manner, achieving unprecedented output resolution and fidelity. Meanwhile, an algorithm for end-to-end training of the proposed cascaded structure is developed. We further introduce Human10, a newly created dataset containing both detailed and textured full-body reconstructions as well as corresponding raw RGB-D scans of 10 subjects. Qualitative and quantitative experimental results on both synthetic and real-world datasets demonstrate that the presented approach outperforms existing state-of-the-art work regarding visual quality and accuracy of reconstructed models.

[1] Matthias Nießner,et al. BundleFusion , 2016, TOGS.

[2] Pierre Alliez,et al. Eurographics Symposium on Geometry Processing (2007) Voronoi-based Variational Reconstruction of Unoriented Point Sets , 2022 .

[3] Jitendra Malik,et al. Hierarchical Surface Prediction for 3D Object Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[4] Karthik Ramani,et al. SurfNet: Generating 3D Shape Surfaces Using Deep Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Patrick Pérez,et al. Poisson image editing , 2003, ACM Trans. Graph..

[6] Olaf Kähler,et al. Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure , 2016, ECCV.

[7] Horst Bischof,et al. OctNetFusion: Learning Depth Fusion from Data , 2017, 2017 International Conference on 3D Vision (3DV).

[8] Honglak Lee,et al. Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[9] Alvaro Collet,et al. High-quality streamable free-viewpoint video , 2015, ACM Trans. Graph..

[10] Vladlen Koltun,et al. Robust reconstruction of indoor scenes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Andrew W. Fitzgibbon,et al. Image-Based Rendering Using Image-Based Priors , 2005, International Journal of Computer Vision.

[12] Stefan Leutenegger,et al. ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[13] Daniel Cohen-Or,et al. Seamless Montage for Texturing Models , 2010, Comput. Graph. Forum.

[14] Zhen Li,et al. High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15] Jan-Michael Frahm,et al. Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[16] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[17] Richard Szeliski,et al. The lumigraph , 1996, SIGGRAPH.

[18] George Drettakis,et al. Scalable inside-out image-based rendering , 2016, ACM Trans. Graph..

[19] Jean-Philippe Pons,et al. Robust piecewise-planar 3D reconstruction and completion from large-scale unstructured point data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20] Matthias Nießner,et al. Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[21] Shi-Min Hu,et al. 3D indoor scene modeling from RGB-D data: a survey , 2015, Computational Visual Media.

[22] Jitendra Malik,et al. View Synthesis by Appearance Flow , 2016, ECCV.

[23] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[24] Bruno Lévy,et al. Least squares conformal maps for automatic texture atlas generation , 2002, ACM Trans. Graph..

[25] Steven M. Seitz,et al. LookinGood , 2018, ACM Trans. Graph..

[26] Simon Fuhrmann,et al. Fusion of depth maps with multiple scales , 2011, ACM Trans. Graph..

[27] Chao Yang,et al. Shape Inpainting Using 3D Generative Adversarial Network and Recurrent Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28] Yang Liu,et al. O-CNN , 2017, ACM Trans. Graph..

[29] Charles T. Loop,et al. Holoportation: Virtual 3D Teleportation in Real-time , 2016, UIST.

[30] Jitendra Malik,et al. Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Andrew W. Fitzgibbon,et al. KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[32] Kiriakos N. Kutulakos,et al. A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[33] Daniel Cremers,et al. Robust odometry estimation for RGB-D cameras , 2013, 2013 IEEE International Conference on Robotics and Automation.

[34] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[35] Horst Bischof,et al. A Globally Optimal Algorithm for Robust TV-L1 Range Image Integration , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[36] Thomas Brox,et al. Multi-view 3D Models from Single Images with a Convolutional Network , 2015, ECCV.

[37] Anita Sellent,et al. Floating Textures , 2008, Comput. Graph. Forum.

[38] Carsten Rother,et al. PatchMatch Stereo - Stereo Matching with Slanted Support Windows , 2011, BMVC.

[39] William E. Lorensen,et al. Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[40] Chongyang Ma,et al. Deep Volumetric Video From Very Sparse Multi-view Performance Capture , 2018, ECCV.

[41] Michael M. Kazhdan,et al. Screened poisson surface reconstruction , 2013, TOGS.

[42] Simon J. Julier,et al. Structured Prediction of Unobserved Voxels from a Single Depth Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Vladlen Koltun,et al. Color map optimization for 3D reconstruction with consumer depth cameras , 2014, ACM Trans. Graph..

[44] Daniel Cremers,et al. Real-time visual odometry from dense RGB-D images , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[45] Kang Chen,et al. Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information , 2014, ACM Trans. Graph..

[46] Michael Bosse,et al. Unstructured lumigraph rendering , 2001, SIGGRAPH.

[47] Andrew W. Fitzgibbon,et al. Kinectrack: 3D Pose Estimation Using a Projected Dense Dot Pattern , 2014, IEEE Transactions on Visualization and Computer Graphics.

[48] Andrew I. Comport,et al. On unifying key-frame and voxel-based dense visual SLAM at large scales , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[49] Shi-Min Hu,et al. Structure recovery by part assembly , 2012, ACM Trans. Graph..

[50] Gabriel Taubin,et al. SSD: Smooth Signed Distance Surface Reconstruction , 2011, Comput. Graph. Forum.

[51] Justus Thies,et al. IGNOR: Image-guided Neural Object Rendering , 2018, ArXiv.

[52] Adrian Hilton,et al. Volumetric performance capture from minimal camera viewpoints , 2018, ECCV.

[53] M. Gross,et al. Algebraic point set surfaces , 2007, ACM Trans. Graph..

[54] Jan-Michael Frahm,et al. 3D Reconstruction Using an n-Layer Heightmap , 2010, DAGM-Symposium.

[55] Jitendra Malik,et al. End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56] Luiz Velho,et al. Hermite Radial Basis Functions Implicits , 2011, Comput. Graph. Forum.

[57] Reinhard Klein,et al. Completion and Reconstruction with Primitive Shapes , 2009, Comput. Graph. Forum.

[58] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59] Thomas Brox,et al. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[60] Vladlen Koltun,et al. Fast MRF Optimization with Application to Depth Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[61] Thomas Brox,et al. Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[62] Steven M. Seitz,et al. Occluding Contours for Multi-view Stereo , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[63] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[64] Tim Weyrich,et al. Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion , 2013, 2013 International Conference on 3D Vision.

[65] Lu Fang,et al. SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[66] Hiroshi Ishikawa,et al. Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[67] Leonidas J. Guibas,et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Shahram Izadi,et al. UltraStereo: Efficient Learning-Based Matching for Active Stereo Systems , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Markus H. Gross,et al. Feature Preserving Point Set Surfaces based on Non‐Linear Kernel Regression , 2009, Comput. Graph. Forum.

[70] Adam Finkelstein,et al. PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[71] Yizhou Yu,et al. Efficient View-Dependent Image-Based Rendering with Projective Texture-Mapping , 1998, Rendering Techniques.

[72] Pierre Alliez,et al. A Survey of Surface Reconstruction from Point Clouds , 2017, Comput. Graph. Forum.

[73] Michael J. Black,et al. SMPL: A Skinned Multi-Person Linear Model , 2023 .

[74] Ersin Yumer,et al. Transformation-Grounded Image Generation Network for Novel 3D View Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75] Thomas A. Funkhouser,et al. Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77] Amitabh Varshney,et al. Montage4D: interactive seamless fusion of multiview video textures , 2018, I3D.

[78] Kun Zhou,et al. An interactive approach to semantic modeling of indoor scenes with an RGBD camera , 2012, ACM Trans. Graph..

[79] Jiajun Wu,et al. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[80] Oliver Grau,et al. VConv-DAE: Deep Volumetric Shape Learning Without Object Labels , 2016, ECCV Workshops.

[81] Wolfram Burgard,et al. OctoMap : A Probabilistic , Flexible , and Compact 3 D Map Representation for Robotic Systems , 2010 .

[82] Marcus A. Magnor,et al. Video Based Reconstruction of 3D People Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[83] Bo Yang,et al. 3D Object Reconstruction from a Single Depth View with Adversarial Learning , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[84] Gernot Riegler,et al. OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[85] Marc Levoy,et al. A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[86] Jitendra Malik,et al. Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[87] George Drettakis,et al. Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[88] Matthias Nießner,et al. Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[89] Cordelia Schmid,et al. BodyNet: Volumetric Inference of 3D Human Body Shapes , 2018, ECCV.

[90] Leonidas J. Guibas,et al. ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[91] Shi-Min Hu,et al. Learning to Reconstruct High-Quality 3D Shapes with Cascaded Fully Convolutional Networks , 2018, ECCV.

[92] Richard K. Beatson,et al. Reconstruction and representation of 3D objects with radial basis functions , 2001, SIGGRAPH.

[93] Hao Chen,et al. 3D deeply supervised network for automated segmentation of volumetric medical images , 2017, Medical Image Anal..

[94] Stefan Leutenegger,et al. ElasticFusion: Real-time dense SLAM and light source estimation , 2016, Int. J. Robotics Res..