论文信息 - Deep Stereo Using Adaptive Thin Volume Representation With Uncertainty Awareness

Deep Stereo Using Adaptive Thin Volume Representation With Uncertainty Awareness

We present Uncertainty-aware Cascaded Stereo Network (UCS-Net) for 3D reconstruction from multiple RGB images. Multi-view stereo (MVS) aims to reconstruct fine-grained scene geometry from multi-view images. Previous learning-based MVS methods estimate per-view depth using plane sweep volumes (PSVs) with a fixed depth hypothesis at each plane; this requires densely sampled planes for high accuracy, which is impractical for high-resolution depth because of limited memory. In contrast, we propose adaptive thin volumes (ATVs); in an ATV, the depth hypothesis of each plane is spatially varying, which adapts to the uncertainties of previous per-pixel depth predictions. Our UCS-Net has three stages: the first stage processes a small PSV to predict low-resolution depth; two ATVs are then used in the following stages to refine the depth with higher resolution and higher accuracy. Our ATV consists of only a small number of planes with low memory and computation costs; yet, it efficiently partitions local depth ranges within learned small uncertainty intervals. We propose to use variance-based uncertainty estimates to adaptively construct ATVs; this differentiable process leads to reasonable and fine-grained spatial partitioning. Our multi-stage framework progressively sub-divides the vast scene space with increasing depth resolution and precision, which enables reconstruction with high completeness and accuracy in a coarse-to-fine fashion. We demonstrate that our method achieves superior performance compared with other learning-based MVS methods on various challenging datasets.

[1] Jeremy S. De Bonet,et al. Poxels: Probabilistic Voxelized Volume Reconstruction , 1999 .

[2] Richard Szeliski,et al. Handling occlusions in dense multi-view stereo , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3] Vladimir Kolmogorov,et al. Multi-camera Scene Reconstruction via Graph Cuts , 2002, ECCV.

[4] Olivier D. Faugeras,et al. Variational stereovision and 3D scene flow estimation with statistical similarity measures , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5] Francis Schmitt,et al. Silhouette and stereo fusion for 3D object modeling , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[6] Kiriakos N. Kutulakos,et al. A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[7] Long Quan,et al. A quasi-dense approach to surface reconstruction from uncalibrated images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Richard Szeliski,et al. A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9] Roberto Cipolla,et al. Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo , 2008, ECCV.

[10] Jean Ponce,et al. Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Pascal Fua,et al. Efficient large-scale multi-view stereo for ultra high-resolution image sets , 2011, Machine Vision and Applications.

[12] Michael M. Kazhdan,et al. Screened poisson surface reconstruction , 2013, TOGS.

[13] Michael J. Black,et al. Towards Probabilistic Volumetric Reconstruction Using Ray Potentials , 2015, 2015 International Conference on 3D Vision.

[14] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[15] Rob Fergus,et al. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[16] Konrad Schindler,et al. Massively Parallel Multiview Stereopsis by Surface Normal Diffusion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17] Gustavo Carneiro,et al. Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[18] John Flynn,et al. Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Konrad Schindler,et al. Just Look at the Image: Viewpoint-Specific Surface Normal Prediction for Improved Multi-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Jan-Michael Frahm,et al. Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[21] Anders Bjorholm Dahl,et al. Large-Scale Data for Multiple-View Stereopsis , 2016, International Journal of Computer Vision.

[22] Alex Kendall,et al. End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23] Thomas Brox,et al. DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Lu Fang,et al. SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25] Long Quan,et al. Relative Camera Refinement for Accurate Dense Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[26] Noah Snavely,et al. Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] ARNO KNAPITSCH,et al. Tanks and temples , 2017, ACM Trans. Graph..

[29] Oisin Mac Aodha,et al. Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] J. Tenenbaum,et al. MarrNet : 3 D Shape Reconstruction via 2 . 5 D Sketches , 2017 .

[31] Jitendra Malik,et al. Learning a Multi-View Stereo Machine , 2017, NIPS.

[32] Karthik Ramani,et al. SurfNet: Generating 3D Shape Surfaces Using Deep Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Horst Bischof,et al. OctNetFusion: Learning Depth Fusion from Data , 2017, 2017 International Conference on 3D Vision (3DV).

[34] Matthias Nießner,et al. Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Luc Van Gool,et al. Learned Multi-patch Similarity , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36] Marc Pollefeys,et al. From Point Clouds to Mesh Using Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37] Narendra Ahuja,et al. DeepMVS: Learning Multi-view Stereopsis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38] Philippos Mordohai,et al. CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39] Jitendra Malik,et al. Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[40] Luc Van Gool,et al. RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41] Stefan Roth,et al. Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42] Wei Liu,et al. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[43] Chen Kong,et al. Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction , 2017, AAAI.

[44] Ping Tan,et al. BA-Net: Dense Bundle Adjustment Network , 2018, ICLR 2018.

[45] Alexey Dosovitskiy,et al. Unsupervised Learning of Shape and Pose with Differentiable Point Clouds , 2018, NeurIPS.

[46] Thomas Brox,et al. DeepTAM: Deep Tracking and Mapping , 2018, ECCV.

[47] Tatsuya Harada,et al. Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48] Long Quan,et al. MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[49] Leonidas J. Guibas,et al. Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[50] Jitendra Malik,et al. End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51] Jiajun Wu,et al. Learning Shape Priors for Single-View 3D Completion and Reconstruction , 2018, ECCV.

[52] Jiajun Wu,et al. Learning to Reconstruct Shapes from Unseen Classes , 2018, NeurIPS.

[53] Stephen Lin,et al. DPSNet: End-to-end Deep Plane Sweep Stereo , 2019, ICLR.

[54] Long Quan,et al. Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Yan Lu,et al. MVPNet: Multi-View Point Regression Networks for 3D Object Reconstruction from A Single Image , 2018, AAAI.

[57] Jing Xu,et al. Point-Based Multi-View Stereo Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[58] Charless C. Fowlkes,et al. 3D Scene Reconstruction With Multi-Layer Depth and Epipolar Transformers , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59] Kalyan Sunkavalli,et al. Deep view synthesis from sparse photometric images , 2019, ACM Trans. Graph..

[60] Vittorio Ferrari,et al. Learning Single-Image 3D Reconstruction by Generative Modelling of Shape, Pose and Shading , 2019, International Journal of Computer Vision.

[61] Jitendra Malik,et al. Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62] Hao Zhang,et al. Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63] Charless C. Fowlkes,et al. Multi-layer Depth and Epipolar Feature Transformers for 3D Scene Reconstruction , 2019, CVPR Workshops.