On the Uncertain Single-View Depths in Endoscopies

Estimating depth from endoscopic images is a pre-requisite for a wide set of AI-assisted technologies, namely accurate localization, measurement of tumors, or identification of non-inspected areas. As the domain specificity of colonoscopies –a deformable low-texture environment with fluids, poor lighting conditions and abrupt sensor motions– pose challenges to multi-view approaches, single-view depth learning stands out as a promising line of research. In this paper, we explore for the first time Bayesian deep networks for single-view depth estimation in colonoscopies. Their uncertainty quantification offers great potential for such a critical application area. Our specific contribution is two-fold: 1) an exhaustive analysis of Bayesian deep networks for depth estimation in three different datasets, highlighting challenges and conclusions regarding synthetic–to–real domain changes and supervised vs. self-supervised methods; and 2) a novel teacherstudent approach to deep depth learning that takes into account the teacher uncertainty.

[1]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Faisal Mahmood,et al.  Unsupervised Reverse Domain Adaptation for Synthetic Medical Images via Adversarial Training , 2017, IEEE Transactions on Medical Imaging.

[4]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[5]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[6]  Ian D. Reid,et al.  Unsupervised Learning of Monocular Depth Estimation and Visual Odometry with Deep Feature Reconstruction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Javier Civera,et al.  Bayesian Deep Neural Networks for Supervised Learning of Single-View Depth , 2021, IEEE Robotics and Automation Letters.

[9]  Stefano Mattoccia,et al.  On the Uncertainty of Self-Supervised Monocular Depth Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Zhengqi Li,et al.  MegaDepth: Learning Single-View Depth Prediction from Internet Photos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Faisal Mahmood,et al.  Deep learning and conditional random fields‐based depth estimation and topographical reconstruction from conventional endoscopy , 2017, Medical Image Anal..

[12]  Yasin Almalioglu,et al.  Unsupervised Odometry and Depth Learning for Endoscopic Capsule Robots , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[14]  Dacheng Tao,et al.  Deep Ordinal Regression Network for Monocular Depth Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[16]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Huoling Luo,et al.  Details preserved unsupervised depth estimation by fusing traditional stereo knowledge from laparoscopic images , 2019, Healthcare technology letters.

[18]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Russell H. Taylor,et al.  On the Sins of Image Synthesis Loss for Self-supervised Depth Estimation , 2021, ArXiv.

[20]  Guang-Zhong Yang,et al.  Three-Dimensional Tissue Deformation Recovery and Tracking , 2010, IEEE Signal Processing Magazine.

[21]  F. Jia,et al.  Unsupervised binocular depth prediction network for laparoscopic surgery , 2019, Computer assisted surgery.

[22]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[24]  Masatoshi Okutomi,et al.  Self-supervised monocular depth estimation in gastroendoscopy using GAN-augmented images , 2021, Medical Imaging.

[25]  Thomas Brox,et al.  CAM-Convs: Camera-Aware Multi-Scale Convolutions for Single-View Depth , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Joong-Hwan Baek,et al.  Unsupervised Monocular Depth Estimation for Colonoscope System Using Feedback Network , 2021, Sensors.

[27]  Yiting Ma,et al.  Depth Estimation for Colonoscopy Images with Self-supervised Learning from Videos , 2021, MICCAI.

[28]  Guang-Zhong Yang,et al.  Context-Aware Depth and Pose Estimation for Bronchoscopic Navigation , 2019, IEEE Robotics and Automation Letters.

[29]  Raquel Urtasun,et al.  Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Juan D. Tardós,et al.  ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[31]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[32]  Thomas Brox,et al.  Uncertainty Estimates and Multi-hypotheses Networks for Optical Flow , 2018, ECCV.

[33]  S. Engelhardt,et al.  Domain gap in adapting self-supervised depth estimation methods for stereo-endoscopy , 2020 .

[34]  Jan-Michael Frahm,et al.  Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[35]  Danail Stoyanov,et al.  Implicit domain adaptation with conditional generative adversarial networks for depth prediction in endoscopy , 2019, International Journal of Computer Assisted Radiology and Surgery.

[36]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[37]  Javier Civera,et al.  Endo-Depth-and-Motion: Reconstruction and Tracking in Endoscopic Videos Using Depth Networks and Photometric Constraints , 2021, IEEE Robotics and Automation Letters.

[38]  Ehud Rivlin,et al.  Detecting Deficient Coverage in Colonoscopies , 2020, IEEE Transactions on Medical Imaging.

[39]  Toshimitsu Kaneko,et al.  Deep monocular 3D reconstruction for assisted navigation in bronchoscopy , 2017, International Journal of Computer Assisted Radiology and Surgery.

[40]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Seokjae Lim,et al.  Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[42]  Guang-Zhong Yang,et al.  Soft-Tissue Motion Tracking and Structure Estimation for Robotic Assisted MIS Procedures , 2005, MICCAI.

[43]  Stamatia Giannarou,et al.  Self-Supervised Generative Adversarial Network for Depth Estimation in Laparoscopic Images , 2021, MICCAI.

[44]  Andrew Zisserman,et al.  Multiple View Geometry in Computer Vision (2nd ed) , 2003 .

[45]  Helder Araujo,et al.  EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth Estimation Approach for Endoscopic Videos: Endo-SfMLearner , 2020 .

[46]  Guang-Zhong Yang,et al.  Dynamic Guidance for Robotic Surgery Using Image-Constrained Biomechanical Models , 2010, MICCAI.

[47]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[48]  Richard J. Chen,et al.  SLAM Endoscopy enhanced by adversarial depth prediction , 2019, ArXiv.

[49]  Russell H. Taylor,et al.  Self-supervised Learning for Dense Depth Estimation in Monocular Endoscopy , 2018, OR 2.0/CARE/CLIP/ISIC@MICCAI.

[50]  Thomas B. Schön,et al.  Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[51]  Eric Brachmann,et al.  On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).