MONOCULAR-DEPTH ASSISTED SEMI-GLOBAL MATCHING

Abstract. Reconstruction of dense photogrammetric point clouds is often based on depth estimation of rectified image pairs by means of pixel-wise matching. The main drawback lies in the high computational complexity compared to that of the relatively straightforward task of laser triangulation. Dense image matching needs oriented and rectified images and looks for point correspondences between them. The search for these correspondences is based on two assumptions: pixels and their local neighborhood show a similar radiometry and image scenes are mostly homogeneous, meaning that neighboring points in one image are most likely also neighbors in the second. These rules are violated, however, at depth changes in the scene. Optimization strategies tend to find the best depth estimation based on the resulting disparities in the two images. One new field in neural networks is the estimation of a depth image from a single input image through learning geometric relations in images. These networks are able to find homogeneous areas as well as depth changes, but result in a much lower geometric accuracy of the estimated depth compared to dense matching strategies. In this paper, a method is proposed extending the Semi-Global-Matching algorithm by utilizing a-priori knowledge from a monocular depth estimating neural network to improve the point correspondence search by predicting the disparity range from the single-image depth estimation (SIDE). The method also saves resources through path optimization and parallelization. The algorithm is benchmarked on Middlebury data and results are presented both quantitatively and qualitatively.

[1]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[2]  Raquel Urtasun,et al.  Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Stefan Leutenegger,et al.  SceneNet RGB-D: Can 5M Synthetic Images Beat Generic ImageNet Pre-training on Indoor Segmentation? , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[5]  Shai Avidan,et al.  Semi-Global Matching: A Principled Derivation in Terms of Message Passing , 2014, GCPR.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Philip J. Hatcher,et al.  Binary adaptive semi-global matching based on image edges , 2015, Digital Image Processing.

[9]  Guosheng Lin,et al.  Deep convolutional neural fields for depth estimation from a single image , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  H. Hirschmüller Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information , 2005, CVPR.

[11]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Friedrich Fraundorfer,et al.  Evaluation of CNN-based Single-Image Depth Estimation Methods , 2018, ECCV Workshops.

[14]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[15]  Xi Wang,et al.  High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth , 2014, GCPR.