论文信息 - Attention-Aware Multi-View Stereo

Attention-Aware Multi-View Stereo

Multi-view stereo is a crucial task in computer vision, that requires accurate and robust photo-consistency among input images for depth estimation. Recent studies have shown that learning-based feature matching and confidence regularization can play a vital role in this task. Nevertheless, how to design good matching confidence volumes as well as effective regularizers for them are still under in-depth study. In this paper, we propose an attention-aware deep neural network “AttMVS” for learning multi-view stereo. In particular, we propose a novel attention-enhanced matching confidence volume, that combines the raw pixel-wise matching confidence from the extracted perceptual features with the contextual information of local scenes, to improve the matching robustness. Furthermore, we develop an attention-guided regularization module, which consists of multilevel ray fusion modules, to hierarchically aggregate and regularize the matching confidence volume into a latent depth probability volume.Experimental results show that our approach achieves the best overall performance on the DTU dataset and the intermediate sequences of Tanks & Temples benchmark over many state-of-the-art MVS algorithms.

[1] Jan-Michael Frahm,et al. PatchMatch Based Joint View Selection and Depthmap Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Wenbing Tao,et al. Multi-Scale Geometric Consistency Guided Multi-View Stereo , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Long Quan,et al. Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Guan Huang,et al. Attention-Guided Unified Network for Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Xiaogang Wang,et al. Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Tao Guan,et al. P-MVSNet: Learning Patch-Wise Matching Confidence Aggregation for Multi-View Stereo , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7] Jean Ponce,et al. Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Long Quan,et al. MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[9] Federico Tombari,et al. ZNCC-based template matching using bounded partial correlation , 2004 .

[10] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[11] Yong-Sheng Chen,et al. Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12] Xiaogang Wang,et al. Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13] Jie Liao,et al. Pyramid Multi‐View Stereo with Local Consistency , 2019, Comput. Graph. Forum.

[14] Lu Fang,et al. SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15] Wenbing Tao,et al. Multi-View Stereo with Asymmetric Checkerboard Propagation and Multi-Hypothesis Joint View Selection , 2018, ArXiv.

[16] Junqing Yu,et al. Significance-Aware Information Bottleneck for Domain Adaptive Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17] Jan-Michael Frahm,et al. Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[18] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Shu-Tao Xia,et al. Second-Order Attention Network for Single Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Thomas Brox,et al. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Konrad Schindler,et al. Massively Parallel Multiview Stereopsis by Surface Normal Diffusion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22] Adam Finkelstein,et al. PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[23] Pascal Fua,et al. Efficient large-scale multi-view stereo for ultra high-resolution image sets , 2011, Machine Vision and Applications.

[24] Torsten Sattler,et al. A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Yi Yang,et al. Taking a Closer Look at Domain Shift: Category-Level Adversaries for Semantics Consistent Domain Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Anders Bjorholm Dahl,et al. Large-Scale Data for Multiple-View Stereopsis , 2016, International Journal of Computer Vision.

[27] Michael M. Kazhdan,et al. Screened poisson surface reconstruction , 2013, TOGS.

[28] Gang Yu,et al. Learning a Discriminative Feature Network for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Luc Van Gool,et al. RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30] ARNO KNAPITSCH,et al. Tanks and temples , 2017, ACM Trans. Graph..

[31] Roberto Cipolla,et al. Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo , 2008, ECCV.

[32] Matteo Matteucci,et al. TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33] Trevor Darrell,et al. Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[35] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[36] Jitendra Malik,et al. Learning a Multi-View Stereo Machine , 2017, NIPS.

[37] Jean-Philippe Pons,et al. High Accuracy and Visibility-Consistent Dense Multiview Stereo , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Narendra Ahuja,et al. DeepMVS: Learning Multi-view Stereopsis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39] Shan Lin,et al. Plane Completion and Filtering for Multi-View Stereo Reconstruction , 2019, GCPR.

[40] Jing Xu,et al. Point-Based Multi-View Stereo Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.