PHI-MVS: Plane Hypothesis Inference Multi-view Stereo for Large-Scale Scene Reconstruction

PatchMatch based Multi-view Stereo (MVS) algorithms have achieved great success in large-scale scene reconstruction tasks. However, reconstruction of texture-less planes often fails as similarity measurement methods may become ineffective on these regions. Thus, a new plane hypothesis inference strategy is proposed to handle the above issue. The procedure consists of two steps: First, multiple plane hypotheses are generated using filtered initial depth maps on regions that are not successfully recovered; Second, depth hypotheses are selected using Markov Random Field (MRF). The strategy can significantly improve the completeness of reconstruction results with only acceptable computing time increasing. Besides, a new acceleration scheme similar to dilated convolution can speed up the depth map estimating process with only a slight influence on the reconstruction. We integrated the above ideas into a new MVS pipeline, Plane Hypothesis Inference Multi-view Stereo (PHIMVS). The result of PHI-MVS is validated on ETH3D public benchmarks, and it demonstrates competing performance against the state-of-the-art.

[1]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Zhuo Chen,et al.  Mesh-Guided Multi-View Stereo With Pyramid Architecture , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Francis Schmitt,et al.  Silhouette and stereo fusion for 3D object modeling , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[4]  Jiansheng Chen,et al.  MVSCRF: Learning Multi-View Stereo With Conditional Random Fields , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Qingshan Xu,et al.  Planar Prior Assisted PatchMatch Multi-View Stereo , 2019, AAAI.

[7]  Jiebo Luo,et al.  Learning to Produce 3D Media From a Captured 2D Video , 2011, IEEE Transactions on Multimedia.

[8]  Carsten Rother,et al.  PatchMatch Stereo - Stereo Matching with Slanted Support Windows , 2011, BMVC.

[9]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  W. Freeman,et al.  Generalized Belief Propagation , 2000, NIPS.

[11]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[12]  Lena Maier-Hein,et al.  Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery , 2013, Medical Image Anal..

[13]  Anders Bjorholm Dahl,et al.  Large-Scale Data for Multiple-View Stereopsis , 2016, International Journal of Computer Vision.

[14]  Wenbing Tao,et al.  PVSNet: Pixelwise Visibility-Aware Multi-View Stereo Network , 2020, ArXiv.

[15]  Shan Lin,et al.  Plane Completion and Filtering for Multi-View Stereo Reconstruction , 2019, GCPR.

[16]  Torsten Sattler,et al.  A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Shuhan Shen,et al.  Accurate Multiple View 3D Reconstruction Using Patch-Based Stereo for Large-Scale Scenes , 2013, IEEE Transactions on Image Processing.

[18]  Jie Li,et al.  Fast and Adaptive 3D Reconstruction With Extensively High Completeness , 2017, IEEE Transactions on Multimedia.

[19]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[20]  Ying Wang,et al.  MARMVS: Matching Ambiguity Reduced Multiple View Stereo for Efficient Large Scale Scene Reconstruction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Yu-Wing Tai,et al.  Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation , 2019, European Conference on Computer Vision.

[23]  Pascal Fua,et al.  Efficient large-scale multi-view stereo for ultra high-resolution image sets , 2011, Machine Vision and Applications.

[24]  Michael Goesele,et al.  Multi-View Stereo Revisited , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[26]  Ilya Kostrikov,et al.  Probabilistic Labeling Cost for High-Accuracy Multi-view Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Changchang Wu,et al.  Towards Linear-Time Incremental Structure from Motion , 2013, 2013 International Conference on 3D Vision.

[30]  Jan-Michael Frahm,et al.  Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[31]  Konrad Schindler,et al.  Massively Parallel Multiview Stereopsis by Surface Normal Diffusion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Wenbing Tao,et al.  Multi-Scale Geometric Consistency Guided Multi-View Stereo , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Michael Goesele,et al.  Multi-View Stereo for Community Photo Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[34]  Matteo Matteucci,et al.  TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Long Quan,et al.  MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[36]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[37]  Vladimir Kolmogorov,et al.  Convergent Tree-Reweighted Message Passing for Energy Minimization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Roberto Cipolla,et al.  Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo , 2008, ECCV.