Deep Stereo Using Adaptive Thin Volume Representation With Uncertainty Awareness

We present Uncertainty-aware Cascaded Stereo Network (UCS-Net) for 3D reconstruction from multiple RGB images. Multi-view stereo (MVS) aims to reconstruct fine-grained scene geometry from multi-view images. Previous learning-based MVS methods estimate per-view depth using plane sweep volumes (PSVs) with a fixed depth hypothesis at each plane; this requires densely sampled planes for high accuracy, which is impractical for high-resolution depth because of limited memory. In contrast, we propose adaptive thin volumes (ATVs); in an ATV, the depth hypothesis of each plane is spatially varying, which adapts to the uncertainties of previous per-pixel depth predictions. Our UCS-Net has three stages: the first stage processes a small PSV to predict low-resolution depth; two ATVs are then used in the following stages to refine the depth with higher resolution and higher accuracy. Our ATV consists of only a small number of planes with low memory and computation costs; yet, it efficiently partitions local depth ranges within learned small uncertainty intervals. We propose to use variance-based uncertainty estimates to adaptively construct ATVs; this differentiable process leads to reasonable and fine-grained spatial partitioning. Our multi-stage framework progressively sub-divides the vast scene space with increasing depth resolution and precision, which enables reconstruction with high completeness and accuracy in a coarse-to-fine fashion. We demonstrate that our method achieves superior performance compared with other learning-based MVS methods on various challenging datasets.

[1]  Jeremy S. De Bonet,et al.  Poxels: Probabilistic Voxelized Volume Reconstruction , 1999 .

[2]  Richard Szeliski,et al.  Handling occlusions in dense multi-view stereo , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  Vladimir Kolmogorov,et al.  Multi-camera Scene Reconstruction via Graph Cuts , 2002, ECCV.

[4]  Olivier D. Faugeras,et al.  Variational stereovision and 3D scene flow estimation with statistical similarity measures , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Francis Schmitt,et al.  Silhouette and stereo fusion for 3D object modeling , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[6]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[7]  Long Quan,et al.  A quasi-dense approach to surface reconstruction from uncalibrated images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Roberto Cipolla,et al.  Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo , 2008, ECCV.

[10]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Pascal Fua,et al.  Efficient large-scale multi-view stereo for ultra high-resolution image sets , 2011, Machine Vision and Applications.

[12]  Michael M. Kazhdan,et al.  Screened poisson surface reconstruction , 2013, TOGS.

[13]  Michael J. Black,et al.  Towards Probabilistic Volumetric Reconstruction Using Ray Potentials , 2015, 2015 International Conference on 3D Vision.

[14]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[15]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Konrad Schindler,et al.  Massively Parallel Multiview Stereopsis by Surface Normal Diffusion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[18]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Konrad Schindler,et al.  Just Look at the Image: Viewpoint-Specific Surface Normal Prediction for Improved Multi-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jan-Michael Frahm,et al.  Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[21]  Anders Bjorholm Dahl,et al.  Large-Scale Data for Multiple-View Stereopsis , 2016, International Journal of Computer Vision.

[22]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Thomas Brox,et al.  DeMoN: Depth and Motion Network for Learning Monocular Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Lu Fang,et al.  SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Long Quan,et al.  Relative Camera Refinement for Accurate Dense Reconstruction , 2017, 2017 International Conference on 3D Vision (3DV).

[26]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Hao Su,et al.  A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  ARNO KNAPITSCH,et al.  Tanks and temples , 2017, ACM Trans. Graph..

[29]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  J. Tenenbaum,et al.  MarrNet : 3 D Shape Reconstruction via 2 . 5 D Sketches , 2017 .

[31]  Jitendra Malik,et al.  Learning a Multi-View Stereo Machine , 2017, NIPS.

[32]  Karthik Ramani,et al.  SurfNet: Generating 3D Shape Surfaces Using Deep Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Horst Bischof,et al.  OctNetFusion: Learning Depth Fusion from Data , 2017, 2017 International Conference on 3D Vision (3DV).

[34]  Matthias Nießner,et al.  Shape Completion Using 3D-Encoder-Predictor CNNs and Shape Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Luc Van Gool,et al.  Learned Multi-patch Similarity , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Marc Pollefeys,et al.  From Point Clouds to Mesh Using Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Narendra Ahuja,et al.  DeepMVS: Learning Multi-view Stereopsis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Philippos Mordohai,et al.  CBMV: A Coalesced Bidirectional Matching Volume for Disparity Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Jitendra Malik,et al.  Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[40]  Luc Van Gool,et al.  RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Stefan Roth,et al.  Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Wei Liu,et al.  Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[43]  Chen Kong,et al.  Learning Efficient Point Cloud Generation for Dense 3D Object Reconstruction , 2017, AAAI.

[44]  Ping Tan,et al.  BA-Net: Dense Bundle Adjustment Network , 2018, ICLR 2018.

[45]  Alexey Dosovitskiy,et al.  Unsupervised Learning of Shape and Pose with Differentiable Point Clouds , 2018, NeurIPS.

[46]  Thomas Brox,et al.  DeepTAM: Deep Tracking and Mapping , 2018, ECCV.

[47]  Tatsuya Harada,et al.  Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Long Quan,et al.  MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[49]  Leonidas J. Guibas,et al.  Learning Representations and Generative Models for 3D Point Clouds , 2017, ICML.

[50]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Jiajun Wu,et al.  Learning Shape Priors for Single-View 3D Completion and Reconstruction , 2018, ECCV.

[52]  Jiajun Wu,et al.  Learning to Reconstruct Shapes from Unseen Classes , 2018, NeurIPS.

[53]  Stephen Lin,et al.  DPSNet: End-to-end Deep Plane Sweep Stereo , 2019, ICLR.

[54]  Long Quan,et al.  Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Yan Lu,et al.  MVPNet: Multi-View Point Regression Networks for 3D Object Reconstruction from A Single Image , 2018, AAAI.

[57]  Jing Xu,et al.  Point-Based Multi-View Stereo Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[58]  Charless C. Fowlkes,et al.  3D Scene Reconstruction With Multi-Layer Depth and Epipolar Transformers , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59]  Kalyan Sunkavalli,et al.  Deep view synthesis from sparse photometric images , 2019, ACM Trans. Graph..

[60]  Vittorio Ferrari,et al.  Learning Single-Image 3D Reconstruction by Generative Modelling of Shape, Pose and Shading , 2019, International Journal of Computer Vision.

[61]  Jitendra Malik,et al.  Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  Charless C. Fowlkes,et al.  Multi-layer Depth and Epipolar Feature Transformers for 3D Scene Reconstruction , 2019, CVPR Workshops.