Depth estimation from single-shot monocular endoscope image using image domain adaptation and edge-aware depth estimation

This paper proposes a depth estimation method from a single-shot monocular endoscopic image. Automated understanding of endoscopic images is important for diagnosis and treatment assistance. Not only the images themselves but also depth information about the images help make the understanding of endoscopic images, such as lesion-size measurements, more accurate. Previous depth estimation methods have used stereo cameras or time-series images. However, many endoscope imaging systems do not support the use of stereo endoscopes and video capturing. Also, automatic classification or recognition of large number of previously stored single-shot monocular endoscopic images is required to perform retrospective studies of endoscopic image analysis. We propose a depth estimation method from a single-shot monocular endoscopic image using Lambertian surface translation by domain adaptation and depth estimation using multi-scale edge loss. The difficulty of the depth estimation is that we cannot obtain real endoscopic images and their corresponding depth images. Depth sensors cannot be attached to endoscopes because of the size limitation. To tackle the difficulty, we employ a two-step estimation process including Lambertian surface translation from unpaired data and depth estimation. The texture and specular reflection on the surface of an organ reduce the accuracy of depth estimations. We apply Lambertian surface translation to an endoscopic image to remove these texture and reflections. Then, we estimate the depth by using a fully convolutional network (FCN). During the training of the FCN, improvement of the object edge similarity between an estimated image and a ground truth depth image is important for getting better results. We introduced a muti-scale edge loss function to improve the accuracy of depth estimation. We quantitatively evaluated the proposed method using real colonoscopic images. The estimated depth values were proportional to the real depth values. Furthermore, we applied the estimated depth images to automated anatomical location identification of colonoscopic images using a convolutional neural network. The identification accuracy of the network improved from 69.2% to 74.1% by using the estimated depth images.

[1]  Paolo Dario,et al.  Fully convolutional neural networks for polyp segmentation in colonoscopy , 2017, Medical Imaging.

[2]  Lena Maier-Hein,et al.  Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery , 2013, Medical Image Anal..

[3]  Toshimitsu Kaneko,et al.  Deep monocular 3D reconstruction for assisted navigation in bronchoscopy , 2017, International Journal of Computer Assisted Radiology and Surgery.

[4]  Russell H. Taylor,et al.  Self-supervised Learning for Dense Depth Estimation in Monocular Endoscopy , 2018, OR 2.0/CARE/CLIP/ISIC@MICCAI.

[5]  Rui Hu,et al.  Deep Rigid Instance Scene Flow , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Nassir Navab,et al.  Deeper Depth Prediction with Fully Convolutional Residual Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[7]  Peter Wonka,et al.  High Quality Monocular Depth Estimation via Transfer Learning , 2018, ArXiv.

[8]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Steven A. Shafer,et al.  Using color to separate reflection components , 1985 .

[10]  Faisal Mahmood,et al.  Deep learning and conditional random fields‐based depth estimation and topographical reconstruction from conventional endoscopy , 2017, Medical Image Anal..

[11]  Jungwon Lee,et al.  Deep Robust Single Image Depth Estimation Neural Network Using Scene Understanding , 2019, CVPR Workshops.

[12]  Chao Liu,et al.  Neural RGB®D Sensing: Depth and Uncertainty From a Video Camera , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Guy Godin,et al.  Separation of diffuse and specular components of surface reflection by use of polarization and statistical analysis of images , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[15]  Andru Putra Twinanda,et al.  Deep Neural Networks Predict Remaining Surgery Duration from Cholecystectomy Videos , 2017, MICCAI.

[16]  Henry Fuchs,et al.  StereoDRNet: Dilated Residual StereoNet , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Sandra Sudarsky,et al.  Deep learning with cinematic rendering: fine-tuning deep neural networks using photorealistic medical images , 2018, Physics in medicine and biology.

[18]  Xiangjian He,et al.  Observation-driven adaptive differential evolution and its application to accurate and smooth bronchoscope three-dimensional motion tracking , 2015, Medical Image Anal..

[19]  Vignesh Prasad,et al.  SfMLearner++: Learning Monocular Depth & Ego-Motion Using Meaningful Geometric Constraints , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[20]  Huoling Luo,et al.  Details preserved unsupervised depth estimation by fusing traditional stereo knowledge from laparoscopic images , 2019, Healthcare technology letters.

[21]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[22]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Kensaku Mori,et al.  Fast software-based volume rendering using multimedia instructions on PC platforms and its application to virtual endoscopy , 2003, SPIE Medical Imaging.

[24]  Faisal Mahmood,et al.  Unsupervised Reverse Domain Adaptation for Synthetic Medical Images via Adversarial Training , 2017, IEEE Transactions on Medical Imaging.

[25]  Bulat Ibragimov,et al.  RIIS-DenseNet: Rotation-Invariant and Image Similarity Constrained Densely Connected Convolutional Network for Polyp Detection , 2018, MICCAI.

[26]  Danail Stoyanov,et al.  Implicit domain adaptation with conditional generative adversarial networks for depth prediction in endoscopy , 2019, International Journal of Computer Assisted Radiology and Surgery.

[27]  Liang Lin,et al.  Single View Stereo Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Berthold K. P. Horn Height and gradient from shading , 1989, International Journal of Computer Vision.

[29]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[30]  Xiaogang Wang,et al.  Group-Wise Correlation Stereo Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  David J. Hawkes,et al.  Clinical application of a surgical navigation system based on virtual laparoscopy in laparoscopic gastrectomy for gastric cancer , 2016, International Journal of Computer Assisted Radiology and Surgery.

[32]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[33]  Ruigang Yang,et al.  GA-Net: Guided Aggregation Net for End-To-End Stereo Matching , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).