How Do Neural Networks See Depth in Single Images?

Deep neural networks have lead to a breakthrough in depth estimation from single images. Recent work often focuses on the accuracy of the depth map, where an evaluation on a publicly available test set such as the KITTI vision benchmark is often the main result of the article. While such an evaluation shows how well neural networks can estimate depth, it does not show how they do this. To the best of our knowledge, no work currently exists that analyzes what these networks have learned. In this work we take the MonoDepth network by Godard et al. and investigate what visual cues it exploits for depth estimation. We find that the network ignores the apparent size of known obstacles in favor of their vertical position in the image. Using the vertical position requires the camera pose to be known; however we find that MonoDepth only partially corrects for changes in camera pitch and roll and that these influence the estimated depth towards obstacles. We further show that MonoDepth's use of the vertical image position allows it to estimate the distance towards arbitrary obstacles, even those not appearing in the training set, but that it requires a strong edge at the ground contact point of the object to do so. In future work we will investigate whether these observations also apply to other neural networks for monocular depth estimation.

[1]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Zijiang J. He,et al.  Distance determined by the angular declination below the horizon , 2001, Nature.

[4]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[5]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[6]  Andreas Geiger,et al.  Object scene flow for autonomous vehicles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[8]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[9]  Vertical position as a cue to pictorial depth: Height in the picture plane versus distance to the horizon , 2010, Attention, perception & psychophysics.

[10]  Simon Lucey,et al.  Learning Depth from Monocular Videos Using Direct Methods , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  W. Epstein,et al.  Perceived depth as a function of relative height under three background conditions. , 1966, Journal of experimental psychology.

[12]  Ping-Sing Tsai,et al.  Shape from Shading: A Survey , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  B. Anderson Depth perception , 2008 .

[14]  James E. Cutting,et al.  Chapter 3 – Perceiving Layout and Knowing Distances: The Integration, Relative Potency, and Contextual Use of Different Information about Depth* , 1995 .

[15]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[16]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[17]  R. Hetherington The Perception of the Visual World , 1952 .

[18]  Jörg Stückler,et al.  Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Bruno Reichart,et al.  Does size matter? , 2018, Xenotransplantation.

[20]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  David A. Forsyth,et al.  Shape from Texture without Boundaries , 2002, International Journal of Computer Vision.

[22]  Stefano Soatto,et al.  A geometric approach to shape from defocus , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[24]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.