MVSCRF: Learning Multi-View Stereo With Conditional Random Fields

We present a deep-learning architecture for multi-view stereo with conditional random fields (MVSCRF). Given an arbitrary number of input images, we first use a U-shape neural network to extract deep features incorporating both global and local information, and then build a 3D cost volume for the reference camera. Unlike previous learning based methods, we explicitly constraint the smoothness of depth maps by using conditional random fields (CRFs) after the stage of cost volume regularization. The CRFs module is implemented as recurrent neural networks so that the whole pipeline can be trained end-to-end. Our results show that the proposed pipeline outperforms previous state-of-the-arts on large-scale DTU dataset. We also achieve comparable results with state-of-the-art learning based methods on outdoor Tanks and Temples dataset without fine-tuning, which demonstrates our method’s generalization ability.

[1]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[2]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Pascal Monasse,et al.  OpenMVG: Open Multiple View Geometry , 2016, RRPR@ICPR.

[4]  Lu Fang,et al.  SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Wenbing Tao,et al.  Multi-View Stereo with Asymmetric Checkerboard Propagation and Multi-Hypothesis Joint View Selection , 2018, ArXiv.

[7]  Jan-Michael Frahm,et al.  Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[8]  Long Quan,et al.  MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[9]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Zoran Obradovic,et al.  Continuous Conditional Random Fields for Efficient Regression in Large Fully Connected Graphs , 2013, AAAI.

[11]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[12]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[13]  Kihong Park,et al.  Learning Descriptor, Confidence, and Depth Estimation in Multi-view Stereo , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Long Quan,et al.  Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[16]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[17]  Pascal Fua,et al.  Efficient large-scale multi-view stereo for ultra high-resolution image sets , 2011, Machine Vision and Applications.

[18]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[19]  Nicu Sebe,et al.  Multi-scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Narendra Ahuja,et al.  DeepMVS: Learning Multi-view Stereopsis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Konrad Schindler,et al.  Massively Parallel Multiview Stereopsis by Surface Normal Diffusion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[23]  Jitendra Malik,et al.  Learning a Multi-View Stereo Machine , 2017, NIPS.

[24]  ARNO KNAPITSCH,et al.  Tanks and temples , 2017, ACM Trans. Graph..

[25]  Roberto Cipolla,et al.  Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo , 2008, ECCV.

[26]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[27]  Henrik Aanæs,et al.  Large Scale Multi-view Stereopsis Evaluation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[29]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).