Pruning multi-view stereo net for efficient 3D reconstruction

Abstract How can we perform an efficient 3D reconstruction with high accuracy and completeness, in the presence of non-Lambertian surface and low textured regions? This paper aims at fast quality 3D reconstruction, best near real time. While deep learning approaches perform very well in multi-view stereo (MVS), the high complexity of models makes them inapplicable in practical applications. Few works were explored to accelerate deep learning-based 3D reconstruction approaches. In this paper, we take an unprecedented attempt to compress and accelerate these models via pruning their redundant parameters. We introduce an efficient channel pruning method for 2D convolutional neural networks (CNNs) based on a mixed back propagation process, where a soft mask is learned to prune the channels using a fast iterative shrinkage-thresholding algorithm. While in 3D CNNs, we train a large multi-scale CNNs architecture and observe that only utilizing one module enough for the 3D reconstruction, which can still maintain the performance of the full-precision model. We achieve an efficient MVS reconstruction system up to 2 times faster, in contrast to the state-of-the-arts, while maintaining comparable model accuracy and even better completeness.

[1]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Robert T. Collins,et al.  A space-sweep approach to true multi-image matching , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Wonyong Sung,et al.  Structured Pruning of Deep Convolutional Neural Networks , 2015, ACM J. Emerg. Technol. Comput. Syst..

[4]  Henrik Aanæs,et al.  Large Scale Multi-view Stereopsis Evaluation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[6]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[7]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Long Quan,et al.  MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[9]  Russell H. Taylor,et al.  Extremely Dense Point Correspondences Using a Learned Feature Descriptor , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[11]  Darius Burschka,et al.  Epipolar-Based Stereo Tracking Without Explicit 3D Reconstruction , 2010, 2010 20th International Conference on Pattern Recognition.

[12]  Nima Tajbakhsh,et al.  UNet++: A Nested U-Net Architecture for Medical Image Segmentation , 2018, DLMIA/ML-CDS@MICCAI.

[13]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[14]  Nanning Zheng,et al.  Stereo Matching Using Belief Propagation , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Daniel Mirota,et al.  Robust motion estimation and structure recovery from endoscopic image sequences with an Adaptive Scale Kernel Consensus estimator , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[17]  Andreas Geiger,et al.  Displets: Resolving stereo ambiguities using object knowledge , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Lior Wolf,et al.  Improved Stereo Matching with Constant Highway Networks and Reflective Confidence Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Liujuan Cao,et al.  Towards Optimal Structured CNN Pruning via Generative Adversarial Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Wei Mao,et al.  Cost Volume Pyramid Based Depth Inference for Multi-View Stereo , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Rongrong Ji,et al.  Accelerating Convolutional Networks via Global & Dynamic Filter Pruning , 2018, IJCAI.

[22]  Shaohuai Shi,et al.  FADNet: A Fast and Accurate Network for Disparity Estimation , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Horst Bischof,et al.  Pushing the limits of stereo using variational stereo estimation , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[24]  Michael M. Kazhdan,et al.  Screened poisson surface reconstruction , 2013, TOGS.

[25]  Darius Burschka,et al.  Advances in Computational Stereo , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Roberto Cipolla,et al.  Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo , 2008, ECCV.

[27]  Daniel Mirota,et al.  Is Multi-model Feature Matching Better for Endoscopic Motion Estimation? , 2014, CARE@MICCAI.

[28]  Mathieu Salzmann,et al.  Learning the Number of Neurons in Deep Networks , 2016, NIPS.

[29]  Andrew Howard,et al.  Real-time stereo visual odometry for autonomous ground vehicles , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[30]  Alex Kendall,et al.  End-to-End Learning of Geometry and Context for Deep Stereo Regression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Long Quan,et al.  Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Simon Lacroix,et al.  Vision-Based SLAM: Stereo and Monocular Approaches , 2007, International Journal of Computer Vision.

[33]  Narendra Ahuja,et al.  DeepMVS: Learning Multi-view Stereopsis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Ruigang Yang,et al.  GA-Net: Guided Aggregation Net for End-To-End Stereo Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jan-Michael Frahm,et al.  Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[37]  Pascal Fua,et al.  Efficient large-scale multi-view stereo for ultra high-resolution image sets , 2011, Machine Vision and Applications.

[38]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Lu Fang,et al.  SurfaceNet: An End-to-End 3D Neural Network for Multiview Stereopsis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Aaron F. Bobick,et al.  Disparity-Space Images and Large Occlusion Stereo , 1994, ECCV.

[42]  Siyu Zhu,et al.  Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Stefan Roth,et al.  Iterative Residual Refinement for Joint Optical Flow and Occlusion Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Mathieu Salzmann,et al.  Compression-aware Training of Deep Networks , 2017, NIPS.

[45]  Konrad Schindler,et al.  Massively Parallel Multiview Stereopsis by Surface Normal Diffusion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Carsten Rother,et al.  PatchMatch Stereo - Stereo Matching with Slanted Support Windows , 2011, BMVC.

[47]  Sheng Tang,et al.  Auto-Balanced Filter Pruning for Efficient Convolutional Neural Networks , 2018, AAAI.

[48]  Yan Xu,et al.  MaskFlownet: Asymmetric Feature Matching With Learnable Occlusion Mask , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).