Unsupervised colonoscopic depth estimation by domain translations with a Lambertian-reflection keeping auxiliary task

A three-dimensional (3D) structure extraction technique viewed from a two-dimensional image is essential for the development of a computer-aided diagnosis (CAD) system for colonoscopy. However, a straightforward application of existing depth-estimation methods to colonoscopic images is impossible or inappropriate due to several limitations of colonoscopes. In particular, the absence of ground-truth depth for colonoscopic images hinders the application of supervised machine learning methods. To circumvent these difficulties, we developed an unsupervised and accurate depth-estimation method. We propose a novel unsupervised depth-estimation method by introducing a Lambertian-reflection model as an auxiliary task to domain translation between real and virtual colonoscopic images. This auxiliary task contributes to accurate depth estimation by maintaining the Lambertian-reflection assumption. In our experiments, we qualitatively evaluate the proposed method by comparing it with state-of-the-art unsupervised methods. Furthermore, we present two quantitative evaluations of the proposed method using a measuring device, as well as a new 3D reconstruction technique and measured polyp sizes. Our proposed method achieved accurate depth estimation with an average estimation error of less than 1 mm for regions close to the colonoscope in both of two types of quantitative evaluations. Qualitative evaluation showed that the introduced auxiliary task reduces the effects of specular reflections and colon wall textures on depth estimation and our proposed method achieved smooth depth estimation without noise, thus validating the proposed method. We developed an accurate depth-estimation method with a new type of unsupervised domain translation with the auxiliary task. This method is useful for analysis of colonoscopic images and for the development of a CAD system since it can extract accurate 3D information.

[1]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[2]  Faisal Mahmood,et al.  Deep learning and conditional random fields‐based depth estimation and topographical reconstruction from conventional endoscopy , 2017, Medical Image Anal..

[3]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Paolo Cignoni,et al.  MeshLab: an Open-Source Mesh Processing Tool , 2008, Eurographics Italian Chapter Conference.

[5]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Saad Nadeem,et al.  Augmenting Colonoscopy Using Extended and Directional CycleGAN for Lossy Image Translation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[8]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Olivier D. Faugeras,et al.  Shape From Shading , 2006, Handbook of Mathematical Models in Computer Vision.

[10]  Masahiro Oda,et al.  Realistic endoscopic image generation method using virtual-to-real image-domain translation , 2019, Healthcare technology letters.

[11]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[12]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[13]  Jan-Michael Frahm,et al.  Real-Time 3D Reconstruction of Colonoscopic Surfaces for Determining Missing Regions , 2019, MICCAI.

[14]  Mark Sandler,et al.  CycleGAN, a Master of Steganography , 2017, ArXiv.

[15]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[16]  Danail Stoyanov,et al.  Implicit domain adaptation with conditional generative adversarial networks for depth prediction in endoscopy , 2019, International Journal of Computer Assisted Radiology and Surgery.

[17]  Hayato Itoh,et al.  Towards Automated Colonoscopy Diagnosis: Binary Polyp Size Estimation via Unsupervised Depth Learning , 2018, MICCAI.

[18]  Kensaku Mori,et al.  Fast software-based volume rendering using multimedia instructions on PC platforms and its application to virtual endoscopy , 2003, SPIE Medical Imaging.

[19]  Nikos Paragios,et al.  Handbook of Mathematical Models in Computer Vision , 2005 .

[20]  Simon Lucey,et al.  Learning Depth from Monocular Videos Using Direct Methods , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Zhengyou Zhang,et al.  A Flexible New Technique for Camera Calibration , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Arie E. Kaufman,et al.  Depth Reconstruction and Computer-Aided Polyp Detection in Optical Colonoscopy Video Frames , 2016, ArXiv.

[23]  David J. Kriegman,et al.  The Bas-Relief Ambiguity , 2004, International Journal of Computer Vision.

[24]  Sertac Karaman,et al.  Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Richard J. Chen,et al.  SLAM Endoscopy enhanced by adversarial depth prediction , 2019, ArXiv.