Inferring Distributions Over Depth from a Single Image

When building a geometric scene understanding system for autonomous vehicles, it is crucial to know when the system might fail. Most contemporary approaches cast the problem as depth regression, whose output is a depth value for each pixel. Such approaches cannot diagnose when failures might occur. One attractive alternative is a deep Bayesian network, which captures uncertainty in both model parameters and ambiguous sensor measurements. However, estimating uncertainties is often slow and the distributions are often limited to be uni-modal. In this paper, we recast the continuous problem of depth regression as discrete binary classification, whose output is an un-normalized distribution over possible depths for each pixel. Such output allows one to reliably and efficiently capture multi-modal depth distributions in ambiguous cases, such as depth discontinuities and reflective surfaces. Results on standard benchmarks show that our method produces accurate depth predictions and significantly better uncertainty estimations than prior art while running near real-time. Finally, by making use of uncertainties of the predicted distribution, we significantly reduce streak-like artifacts and improves accuracy as well as memory efficiency in 3D map reconstruction. Video and code can be found on the project website1.

[1]  Philip Koopman,et al.  Challenges in Autonomous Vehicle Testing and Validation , 2016 .

[2]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[3]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[4]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Pushmeet Kohli,et al.  Multiple Choice Learning: Learning to Produce Multiple Structured Outputs , 2012, NIPS.

[7]  Jörg Stückler,et al.  Semi-Supervised Deep Learning for Monocular Depth Map Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Xiaoyan Hu,et al.  A Quantitative Evaluation of Confidence Measures for Stereo Vision , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[10]  Deva Ramanan,et al.  Brute-Force Facial Landmark Analysis With A 140, 000-Way Classifier , 2018, AAAI.

[11]  Michael Cogswell,et al.  Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles , 2016, NIPS.

[12]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Nicu Sebe,et al.  Multi-scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Liam Paull,et al.  Autonomous Vehicle Navigation in Rural Environments Without Detailed Prior Maps , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[15]  Roberto Cipolla,et al.  Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding , 2015, BMVC.

[16]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  S. Roth,et al.  Lightweight Probabilistic Deep Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Maximilian Baust,et al.  Learning in an Uncertain World: Representing Ambiguity Through Multiple Hypotheses , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  J. Weickert,et al.  A Confidence Measure for Variational Optic flow Methods , 2006 .

[20]  Marc Pollefeys,et al.  Robust Dense Mapping for Large-Scale Dynamic Environments , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[22]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[23]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[24]  Wolfram Burgard,et al.  OctoMap: an efficient probabilistic 3D mapping framework based on octrees , 2013, Autonomous Robots.

[25]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[27]  Chunhua Shen,et al.  Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Fei Gao,et al.  Real-time monocular dense mapping on aerial robots using visual-inertial fusion , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Amnon Shashua,et al.  On a Formal Model of Safe and Scalable Self-driving Cars , 2017, ArXiv.