论文信息 - 3DVNet: Multi-View Depth Prediction and Volumetric Refinement

3DVNet: Multi-View Depth Prediction and Volumetric Refinement

We present 3DVNet, a novel multi-view stereo (MVS) depth-prediction method that combines the advantages of previous depth-based and volumetric MVS approaches. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions, resulting in highly accurate predictions which agree on the underlying scene geometry. Unlike existing depth-prediction techniques, our method uses a volumetric 3D convolutional neural network (CNN) that operates in world space on all depth maps jointly. The network can therefore learn meaningful scene-level priors. Furthermore, unlike existing volumetric MVS techniques, our 3D CNN operates on a feature-augmented point cloud, allowing for effective aggregation of multi-view information and flexible iterative refinement of depth maps. Experimental results show our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics on the ScanNet dataset, as well as a selection of scenes from the TUM-RGBD and ICL-NUIM datasets. This shows that our method is both effective and generalizes to new settings.

[1] Andrew J. Davison,et al. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[2] Konrad Schindler,et al. Massively Parallel Multiview Stereopsis by Surface Normal Diffusion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3] Wolfram Burgard,et al. A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4] Stephen Lin,et al. DPSNet: End-to-end Deep Plane Sweep Stereo , 2019, ICLR.

[5] Vijay Badrinarayanan,et al. Atlas: End-to-End 3D Scene Reconstruction from Posed Images , 2020, ECCV.

[6] Yuxin Hou,et al. Multi-View Stereo by Temporal Nonparametric Fusion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7] Zehao Yu,et al. Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Silvio Savarese,et al. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Vladlen Koltun,et al. Open3D: A Modern Library for 3D Data Processing , 2018, ArXiv.

[10] Vijay Badrinarayanan,et al. DELTAS: Depth Estimation by Learning Triangulation and Densification of Sparse Points , 2020, ECCV.

[11] Qi Shan,et al. MVS2D: Efficient Multiview Stereo via Attention-Driven 2D Convolutions , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Yue Wang,et al. Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Marc Pollefeys,et al. DeepVideoMVS: Multi-View Stereo on Video with Recurrent Spatio-Temporal Fusion , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Gerard Pons-Moll,et al. Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Jan Eric Lenssen,et al. Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[19] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[20] Marc Pollefeys,et al. Convolutional Occupancy Networks , 2020, ECCV.

[21] Long Quan,et al. MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[22] Yu-Wing Tai,et al. Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation , 2019, European Conference on Computer Vision.

[23] Shaojie Shen,et al. MVDepthNet: Real-Time Multiview Depth Estimation Neural Network , 2018, 2018 International Conference on 3D Vision (3DV).

[24] Tao Guan,et al. P-MVSNet: Learning Patch-Wise Matching Confidence Aggregation for Multi-View Stereo , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25] Hujun Bao,et al. NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Matthias Nießner,et al. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Henrik Aanæs,et al. Large Scale Multi-view Stereopsis Evaluation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Jing Xu,et al. Point-Based Multi-View Stereo Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29] Long Quan,et al. Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Jan-Michael Frahm,et al. Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.