Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation

n this paper, we propose an effective and efficient pyramid multi-view stereo (MVS) net with self-adaptive view aggregation for accurate and complete dense point cloud reconstruction. Different from using mean square variance to generate cost volume in previous deep-learning based MVS methods, our \textbf{VA-MVSNet} incorporates the cost variances in different views with small extra memory consumption by introducing two novel self-adaptive view aggregations: pixel-wise view aggregation and voxel-wise view aggregation. To further boost the robustness and completeness of 3D point cloud reconstruction, we extend VA-MVSNet with pyramid multi-scale images input as \textbf{PVA-MVSNet}, where multi-metric constraints are leveraged to aggregate the reliable depth estimation at the coarser scale to fill in the mismatched regions at the finer scale. Experimental results show that our approach establishes a new state-of-the-art on the \textsl{\textbf{DTU}} dataset with significant improvements in the completeness and overall quality, and has strong generalization by achieving a comparable performance as the state-of-the-art methods on the \textsl{\textbf{Tanks and Temples}} benchmark. Our codebase is at \hyperlink{this https URL}{this https URL}

[1]  Jan-Michael Frahm,et al.  PatchMatch Based Joint View Selection and Depthmap Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[4]  Jan-Michael Frahm,et al.  Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[5]  Narendra Ahuja,et al.  DeepMVS: Learning Multi-view Stereopsis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Marc Pollefeys,et al.  Multi-View Stereo via Graph Cuts on the Dual of an Adaptive Tetrahedral Mesh , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Roberto Cipolla,et al.  Multiview Stereo via Volumetric Graph-Cuts and Occlusion Robust Photo-Consistency , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Sina Honari,et al.  Improving Landmark Localization with Semi-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Pascal Fua,et al.  Efficient large-scale multi-view stereo for ultra high-resolution image sets , 2011, Machine Vision and Applications.

[11]  Long Quan,et al.  A quasi-dense approach to surface reconstruction from uncalibrated images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jean-Philippe Pons,et al.  Towards high-resolution large-scale multi-view stereo , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Anders Bjorholm Dahl,et al.  Large-Scale Data for Multiple-View Stereopsis , 2016, International Journal of Computer Vision.

[14]  Stephen Lin,et al.  DPSNet: End-to-end Deep Plane Sweep Stereo , 2019, ICLR.

[15]  Konrad Schindler,et al.  Massively Parallel Multiview Stereopsis by Surface Normal Diffusion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xu Zhao,et al.  EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching , 2018, ACCV.

[19]  Long Quan,et al.  MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[20]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[21]  Tao Guan,et al.  P-MVSNet: Learning Patch-Wise Matching Confidence Aggregation for Multi-View Stereo , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Carsten Rother,et al.  Fast cost-volume filtering for visual correspondence and beyond , 2011, CVPR 2011.

[23]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Zhidong Deng,et al.  SegStereo: Exploiting Semantic Information for Disparity Estimation , 2018, ECCV.

[25]  Lu Fang,et al.  SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Siyu Zhu,et al.  Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Long Quan,et al.  Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Daniel Cremers,et al.  Multiview Stereo and Silhouette Consistency via Convex Functionals over Convex Domains , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Torsten Sattler,et al.  A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Luc Van Gool,et al.  Learned Multi-patch Similarity , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Hao Su,et al.  Deep Stereo Using Adaptive Thin Volume Representation With Uncertainty Awareness , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  ARNO KNAPITSCH,et al.  Tanks and temples , 2017, ACM Trans. Graph..

[36]  Roberto Cipolla,et al.  Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo , 2008, ECCV.

[37]  Jing Xu,et al.  Point-Based Multi-View Stereo Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Pascal Fua,et al.  On benchmarking camera calibration and multi-view stereo for high resolution imagery , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Wenbing Tao,et al.  Multi-Scale Geometric Consistency Guided Multi-View Stereo , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[43]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.