论文信息 - RGBD-Net: Predicting color and depth images for novel views synthesis

RGBD-Net: Predicting color and depth images for novel views synthesis

We address the problem of novel view synthesis from an unstructured set of reference images. A new method called RGBD-Net is proposed to predict the depth map and the color images at the target pose in a multi-scale manner. The reference views are warped to the target pose to obtain multi-scale plane sweep volumes, which are then passed to our first module, a hierarchical depth regression network which predicts the depth map of the novel view. Second, a depth-aware generator network refines the warped novel views and renders the final target image. These two networks can be trained with or without depth supervision. In experimental evaluation, RGBD-Net not only produces novel views with higher quality than the previous state-of-the-art methods, but also the obtained depth maps enable reconstruction of more accurate 3D point clouds than the existing multi-view stereo methods. The results indicate that RGBD-Net generalizes well to previously unseen data.

[1] Karol Myszkowski,et al. X-Fields , 2020, ACM Trans. Graph..

[2] Justus Thies,et al. Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[3] Long Quan,et al. BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] ARNO KNAPITSCH,et al. Tanks and temples , 2017, ACM Trans. Graph..

[5] Reinhard Koch,et al. Plenoptic Modeling and Rendering from Image Sequences Taken by Hand-Held Camera , 1999, DAGM-Symposium.

[6] Gordon Wetzstein,et al. State of the Art on Neural Rendering , 2020, Comput. Graph. Forum.

[7] Jan-Michael Frahm,et al. Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[8] Noah Snavely,et al. Single-View View Synthesis With Multiplane Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Li Zhang,et al. Soft 3D reconstruction for view synthesis , 2017, ACM Trans. Graph..

[10] George Drettakis,et al. Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[11] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[12] Yu-Wing Tai,et al. Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation , 2019, ECCV.

[13] Jing Xu,et al. Point-Based Multi-View Stereo Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14] Jonathan T. Barron,et al. Pushing the Boundaries of View Extrapolation With Multiplane Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Taesung Park,et al. Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Victor Lempitsky,et al. Neural Point-Based Graphics , 2019, ECCV.

[17] Tom Duff,et al. Compositing digital images , 1984, SIGGRAPH.

[18] Robert T. Collins,et al. A space-sweep approach to true multi-image matching , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19] Steven M. Seitz,et al. View morphing , 1996, SIGGRAPH.

[20] Lance Williams,et al. View Interpolation for Image Synthesis , 1993, SIGGRAPH.

[21] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[22] S. Sengupta,et al. Intermediate view synthesis in wide-baseline stereoscopic video for immersive telepresence , 2005 .

[23] Jan Kautz,et al. Extreme View Synthesis , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24] Konrad Schindler,et al. Massively Parallel Multiview Stereopsis by Surface Normal Diffusion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25] Gordon Wetzstein,et al. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[26] Peter Wonka,et al. SEAN: Image Synthesis With Semantic Region-Adaptive Normalization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Jaakko Lehtinen,et al. Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Vladlen Koltun,et al. Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29] Michael Bosse,et al. Unstructured lumigraph rendering , 2001, SIGGRAPH.

[30] Richard Szeliski,et al. The lumigraph , 1996, SIGGRAPH.

[31] Ned Greene,et al. Environment Mapping and Other Applications of World Projections , 1986, IEEE Computer Graphics and Applications.

[32] Samuel B. Williams,et al. ASSOCIATION FOR COMPUTING MACHINERY , 2000 .

[33] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Ting-Chun Wang,et al. Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[36] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[37] Zhengqi Li,et al. Crowdsampling the Plenoptic Function , 2020, ECCV.

[38] Yong-Liang Yang,et al. HoloGAN: Unsupervised Learning of 3D Representations From Natural Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[39] Narendra Ahuja,et al. DeepMVS: Learning Multi-view Stereopsis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40] Paul Debevec,et al. DeepView: View Synthesis With Learned Gradient Descent , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Ravi Ramamoorthi,et al. Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[42] Gordon Wetzstein,et al. DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Oliver Wang,et al. MSG-GAN: Multi-Scale Gradients for Generative Adversarial Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Peter Hedman,et al. DeepView Immersive Light Field Video , 2020, SIGGRAPH Immersive Pavilion.

[45] Anders Bjorholm Dahl,et al. Large-Scale Data for Multiple-View Stereopsis , 2016, International Journal of Computer Vision.

[46] Radu Timofte,et al. AIM 2020 Challenge on Image Extreme Inpainting , 2020, ECCV Workshops.

[47] Graham Fyffe,et al. Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[48] Yong-Liang Yang,et al. BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images , 2020, NeurIPS.

[49] Gernot Riegler,et al. Free View Synthesis , 2020, ECCV.

[50] Hao Su,et al. Deep Stereo Using Adaptive Thin Volume Representation With Uncertainty Awareness , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Andreas Geiger,et al. Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Jan-Michael Frahm,et al. Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Marc Levoy,et al. Light field rendering , 1996, SIGGRAPH.

[54] Long Quan,et al. MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[55] Justus Thies,et al. Image-guided Neural Object Rendering , 2020, ICLR.

[56] Siyu Zhu,et al. Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Tat-Seng Chua,et al. Neural Sparse Voxel Fields , 2020, NeurIPS.

[58] Jonathan T. Barron,et al. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis , 2020, ECCV.

[59] Takayuki Okatani,et al. Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps With Accurate Object Boundaries , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[60] George Drettakis,et al. Scalable inside-out image-based rendering , 2016, ACM Trans. Graph..