A New Distributional Ranking Loss With Uncertainty: Illustrated in Relative Depth Estimation

We propose a new approach for the problem of relative depth estimation from a single image. Instead of directly regressing over depth scores, we formulate the problem as estimation of a probability distribution over depth and aim to learn the parameters of the distributions which maximize the likelihood of the given data. To train our model, we propose a new ranking loss, Distributional Loss, which tries to increase the probability of farther pixel’s depth being greater than the closer pixel’s depth. Our proposed approach allows our model to output confidence in its estimation in the form of standard deviation of the distribution. We achieve state of the art results against a number of baselines while providing confidence in our estimations. Our analysis show that estimated confidence is actually a good indicator of accuracy. We investigate the usage of confidence information in a downstream task of metric depth estimation, to increase its performance.

[1]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[2]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[4]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[5]  Xueqi Cheng,et al.  Position-Aware ListMLE: A Sequential Learning Process for Ranking , 2014, UAI.

[6]  Davide Scaramuzza,et al.  REMODE: Probabilistic, monocular dense reconstruction in real time , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Zhe L. Lin,et al.  Structure-Guided Ranking Loss for Single Image Depth Prediction , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Renjie Liao,et al.  GeoNet: Geometric Neural Network for Joint Depth and Surface Normal Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Simon Lucey,et al.  Learning Depth from Monocular Videos Using Direct Methods , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[12]  Stephen Gould,et al.  Single image depth estimation from predicted semantic labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Nicu Sebe,et al.  Pattern-Affinitive Propagation Across Depth, Surface Normal and Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, ACM Trans. Graph..

[15]  Jon Almazán,et al.  Learning With Average Precision: Training Image Retrieval With a Listwise Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Alexander H. Liu,et al.  Towards Scene Understanding: Unsupervised Monocular Depth Estimation With Semantic-Aware Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Philip H.S. Torr,et al.  Calibrating Deep Neural Networks using Focal Loss , 2020, NeurIPS.

[18]  Zhiguo Cao,et al.  Monocular Relative Depth Perception with Web Stereo Data Supervision , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Gözde B. Ünal,et al.  Relative Depth Estimation as a Ranking Problem , 2020, ArXiv.

[20]  Weifeng Chen,et al.  Single-Image Depth Perception in the Wild , 2016, NIPS.

[21]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Xiang Li,et al.  Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation , 2018, ECCV.

[23]  Yong Jae Lee,et al.  Cross-Domain Self-Supervised Multi-task Feature Learning Using Synthetic Imagery , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[25]  Chunhua Shen,et al.  Estimating Depth From Monocular Images as Classification Using Deep Fully Convolutional Residual Networks , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Stan Sclaroff,et al.  Deep Metric Learning to Rank , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Tao Qin,et al.  Query-level loss functions for information retrieval , 2008, Inf. Process. Manag..

[28]  Tao Qin,et al.  FRank: a ranking method with fidelity loss , 2007, SIGIR.

[29]  Federico Tombari,et al.  CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[32]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[33]  Weifeng Chen,et al.  Learning Single-Image Depth From Videos Using Quality Assessment Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Michael J. Black,et al.  Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[36]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[37]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[38]  Ashutosh Saxena,et al.  High speed obstacle avoidance using monocular vision and reinforcement learning , 2005, ICML.

[39]  William T. Freeman,et al.  Learning Ordinal Relationships for Mid-Level Vision , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Nicu Sebe,et al.  PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Alan L. Yuille,et al.  Towards unified depth and semantic prediction from a single image , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).