Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion

In this paper, we address the problem of fusing monocular depth estimation with a conventional multi-view stereo or SLAM to exploit the “best of both worlds”, that is, the accurate dense depth of the first one and lightweightness of the second one. More specifically, we use a conventional pipeline to produce a sparse 3D point cloud that is fed to a monocular depth estimation network to enhance its performance. In this way, we can achieve accuracy similar to multi-view stereo with a considerably smaller number of weights. We also show that even as few as 32 points is sufficient to outperform the best monocular depth estimation methods, and around 200 points to gain full advantage of the additional information. Moreover, we demonstrate the efficacy of our approach by integrating it with a SLAM system built-in on mobile devices.

[1]  Long Quan,et al.  Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Klaus Diepold,et al.  Dense disparity maps from sparse disparity measurements , 2011, 2011 International Conference on Computer Vision.

[3]  Jungwon Lee,et al.  Deep Robust Single Image Depth Estimation Neural Network Using Scene Understanding , 2019, CVPR Workshops.

[4]  Xuejin Chen,et al.  Structure-Aware Residual Pyramid Network for Monocular Depth Estimation , 2019, IJCAI.

[5]  Luc Van Gool,et al.  Towards Good Practice for CNN-Based Monocular Depth Estimation , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Fawzi Nashashibi,et al.  Sparse and Dense Data with CNNs: Depth Completion and Semantic Segmentation , 2018, 2018 International Conference on 3D Vision (3DV).

[7]  Hujun Bao,et al.  Depth Completion From Sparse LiDAR Data With Depth-Normal Constraints , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Ruigang Yang,et al.  Depth Estimation via Affinity Learned with Convolutional Spatial Propagation Network , 2018, ECCV.

[9]  Yinda Zhang,et al.  Deep Depth Completion of a Single RGB-D Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Ruigang Yang,et al.  Learning Depth with Convolutional Spatial Propagation Network , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jonathan T. Barron,et al.  The Fast Bilateral Solver , 2015, ECCV.

[12]  Yunfei Long,et al.  Depth Coefficients for Depth Completion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jan-Michael Frahm,et al.  Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[14]  Vincent Lepetit,et al.  SharpNet: Fast and Accurate Recovery of Occluding Contours in Monocular Depth Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[15]  Kyungdon Joo,et al.  Non-Local Spatial Propagation Network for Depth Completion , 2020, ECCV.

[16]  M. Pollefeys,et al.  DeepLiDAR: Deep Surface Normal Guided Depth Prediction for Outdoor Scene From Sparse LiDAR Data and Single Color Image , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Alireza Fathi,et al.  The Devil is in the Decoder: Classification, Regression and GANs , 2017, International Journal of Computer Vision.

[18]  Takayuki Okatani,et al.  Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps With Accurate Object Boundaries , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[20]  Renjie Liao,et al.  GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Chang-Su Kim,et al.  Monocular Depth Estimation Using Relative Depth Maps , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Thomas Brox,et al.  Sparsity Invariant CNNs , 2017, 2017 International Conference on 3D Vision (3DV).

[23]  Chunhua Shen,et al.  Enforcing Geometric Constraints of Virtual Normal for Depth Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[25]  Raquel Urtasun,et al.  Learning Joint 2D-3D Representations for Depth Completion , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Jan Kautz,et al.  PlaneRCNN: 3D Plane Detection and Reconstruction From a Single Image , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Long Quan,et al.  MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[31]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[32]  Rynson W. H. Lau,et al.  Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss , 2018, ECCV.

[33]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[34]  Sertac Karaman,et al.  Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Il Hong Suh,et al.  From Big to Small: Multi-Scale Local Planar Guidance for Monocular Depth Estimation , 2019, ArXiv.

[36]  Jiri Matas,et al.  Guiding Monocular Depth Estimation Using Depth-Attention Volume , 2020, ECCV.

[37]  Ersin Yumer,et al.  PlaneNet: Piece-Wise Planar Reconstruction from a Single RGB Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Alexandre Boulch,et al.  FKAConv: Feature-Kernel Alignment for Point Cloud Convolution. , 2020 .

[40]  Richard Szeliski,et al.  Consistent video depth estimation , 2020, ACM Trans. Graph..

[41]  Sinisa Todorovic,et al.  Monocular Depth Estimation Using Neural Regression Forest , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Sebastian Thrun,et al.  An Application of Markov Random Fields to Range Sensing , 2005, NIPS.