EndoSLAM Dataset and An Unsupervised Monocular Visual Odometry and Depth Estimation Approach for Endoscopic Videos: Endo-SfMLearner

Deep learning techniques hold promise to develop dense topography reconstruction and pose estimation methods for endoscopic videos. However, currently available datasets do not support effective quantitative benchmarking. In this paper, we introduce a comprehensive endoscopic SLAM dataset consisting of 3D point cloud data for six porcine organs, capsule and standard endoscopy recordings, synthetically generated data as well as clinically in use conventional endoscope recording of the phantom colon with computed tomography(CT) scan ground truth. A Panda robotic arm, two commercially available capsule endoscopes, three conventional endoscopes with different camera properties, two high precision 3D scanners, and a CT scanner were employed to collect data from eight ex-vivo porcine gastrointestinal (GI)-tract organs and a silicone colon phantom model. In total, 35 sub-datasets are provided with 6D pose ground truth for the ex-vivo part: 18 sub-datasets for colon, 12 sub-datasets for stomach, and 5 sub-datasets for small intestine, while four of these contain polyp-mimicking elevations carried out by an expert gastroenterologist. To verify the applicability of this data for use with real clinical systems, we recorded a video sequence with a state-of-the-art colonoscope from a full representation silicon colon phantom. Synthetic capsule endoscopy frames from stomach, colon, and small intestine with both depth and pose annotations are included to facilitate the study of simulation-to-real transfer learning algorithms. Additionally, we propound Endo-SfMLearner, an unsupervised monocular depth and pose estimation method that combines residual networks with a spatial attention module in order to dictate the network to focus on distinguishable and highly textured tissue regions. The proposed approach makes use of a brightness-aware photometric loss to improve the robustness under fast frame-to-frame illumination changes that are commonly seen in endoscopic videos. To exemplify the use-case of the EndoSLAM dataset, the performance of Endo-SfMLearner is extensively compared with the state-of-the-art: SC-SfMLearner, Monodepth2, and SfMLearner. The codes and the link for the dataset are publicly available at https://github.com/CapsuleEndoscope/EndoSLAM. A video demonstrating the experimental setup and procedure is accessible as Supplementary Video 1.

[1]  Dimitris K. Iakovidis,et al.  Video-based measurements for wireless capsule endoscope tracking , 2014 .

[2]  Andrew J. Davison,et al.  A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Guoyu Lu,et al.  Deep Unsupervised Learning for Simultaneous Visual Odometry and Depth Estimation , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[5]  Steven M. Seitz,et al.  Multicore bundle adjustment , 2011, CVPR 2011.

[6]  Lisandro J. Puglisi On the Velocity and Acceleration Estimation from Discrete Time-Position Signal of Linear Encoders , 2015 .

[7]  Helder Araújo,et al.  Deep EndoVO: A recurrent convolutional neural network (RCNN) based visual odometry approach for endoscopic capsule robots , 2017, Neurocomputing.

[8]  E. Redondo-Cerezo,et al.  Wireless capsule endoscopy: perspectives beyond gastrointestinal bleeding. , 2014, World journal of gastroenterology.

[9]  Mubarak Shah,et al.  Shape from shading using linear approximation , 1994, Image Vis. Comput..

[10]  Aymeric Histace,et al.  Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer , 2014, International Journal of Computer Assisted Radiology and Surgery.

[11]  Roger Y. Tsai,et al.  A new technique for fully autonomous and efficient 3D robotics hand/eye calibration , 1988, IEEE Trans. Robotics Autom..

[12]  Sara Moccia,et al.  EndoAbS dataset: Endoscopic abdominal stereo image dataset for benchmarking 3D stereo reconstruction algorithms , 2018, The international journal of medical robotics + computer assisted surgery : MRCAS.

[13]  Zhengyou Zhang,et al.  Flexible camera calibration by viewing a plane from unknown orientations , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[14]  Elena De Momi,et al.  Learning-based classification of informative laryngoscopic frames , 2018, Comput. Methods Programs Biomed..

[15]  T. Shah,et al.  Development of a Tracking Algorithm for an In-Vivo RF Capsule Prototype , 2006, 2006 International Conference on Electrical and Computer Engineering.

[16]  Won Ho Kim,et al.  Comparison of the diagnostic yield of "MiroCam" and "PillCam SB" capsule endoscopy. , 2012, Hepato-gastroenterology.

[17]  Russell H. Taylor,et al.  Evaluation and Stability Analysis of Video-Based Navigation System for Functional Endoscopic Sinus Surgery on In Vivo Clinical Data , 2018, IEEE Transactions on Medical Imaging.

[18]  Michael J. Black,et al.  Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Baoyuan Wu,et al.  Unsupervised Multi-View Constrained Convolutional Network for Accurate Depth Estimation , 2020, IEEE Transactions on Image Processing.

[21]  Guangjun Liu,et al.  On velocity estimation using position measurements , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[22]  Helder Araújo,et al.  A fully dense and globally consistent 3D map reconstruction approach for GI tract to enhance therapeutic relevance of the endoscopic capsule robot , 2017, ArXiv.

[23]  Marc Pollefeys,et al.  Real-time velocity estimation based on optical flow and disparity matching , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Edward L. Giovannucci,et al.  Global Burden of 5 Major Types Of Gastrointestinal Cancer. , 2020, Gastroenterology.

[25]  Ann G Zauber,et al.  Colorectal cancer screening: Estimated future colonoscopy need and current volume and capacity , 2016, Cancer.

[26]  Guang-Zhong Yang,et al.  Simultaneous Stereoscope Localization and Soft-Tissue Mapping for Minimal Invasive Surgery , 2006, MICCAI.

[27]  Michael Riegler,et al.  Nerthus: A Bowel Preparation Quality Video Dataset , 2017, MMSys.

[28]  Guang-Zhong Yang,et al.  Self-Supervised Siamese Learning on Stereo Image Pairs for Depth Estimation in Robotic Surgery , 2017, ArXiv.

[29]  Weihua Li,et al.  An Effective Localization Method for Robotic Endoscopic Capsules Using Multiple Positron Emission Markers , 2014, IEEE Transactions on Robotics.

[30]  Mathias Lux,et al.  HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy , 2020, Scientific data.

[31]  Yu Sun,et al.  Simultaneous Tracking, 3D Reconstruction and Deforming Point Detection for Stereoscope Guided Surgery , 2013, AE-CAI.

[32]  Weihua Li,et al.  A Review of Localization Systems for Robotic Endoscopic Capsules , 2012, IEEE Transactions on Biomedical Engineering.

[33]  Metin Sitti,et al.  A 5-D Localization Method for a Magnetically Manipulated Untethered Robot Using a 2-D Array of Hall-Effect Sensors , 2016, IEEE/ASME Transactions on Mechatronics.

[34]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Fernando Vilariño,et al.  Towards automatic polyp detection with a polyp appearance model , 2012, Pattern Recognit..

[36]  Gustavo Carneiro,et al.  Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue , 2016, ECCV.

[37]  Aymeric Histace,et al.  Comparative Validation of Polyp Detection Methods in Video Colonoscopy: Results From the MICCAI 2015 Endoscopic Vision Challenge , 2017, IEEE Transactions on Medical Imaging.

[38]  Faisal Mahmood,et al.  Deep learning and conditional random fields‐based depth estimation and topographical reconstruction from conventional endoscopy , 2017, Medical Image Anal..

[39]  Guang-Zhong Yang,et al.  Real-Time Stereo Reconstruction in Robotically Assisted Minimally Invasive Surgery , 2010, MICCAI.

[40]  Yasin Almalioglu,et al.  Unsupervised Odometry and Depth Learning for Endoscopic Capsule Robots , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[42]  Michael Riegler,et al.  KVASIR: A Multi-Class Image Dataset for Computer Aided Gastrointestinal Disease Detection , 2017, MMSys.

[43]  Yasin Almalioglu,et al.  VR-Caps: A Virtual Environment for Capsule Endoscopy , 2020, Medical Image Anal..

[44]  Nilanjan Dey,et al.  Wireless Capsule Gastrointestinal Endoscopy: Direction-of-Arrival Estimation Based Localization Survey , 2017, IEEE Reviews in Biomedical Engineering.

[45]  Nick Barnes,et al.  A Robust Docking Strategy for a Mobile Robot Using Flow Field Divergence , 2008, IEEE Trans. Robotics.

[46]  Howie Choset,et al.  Intelligent Surgical Robots with Situational Awareness , 2015 .

[47]  Matthew A. Brown,et al.  Automatic Panoramic Image Stitching using Invariant Features , 2007, International Journal of Computer Vision.

[48]  T. Yano,et al.  Vascular, polypoid, and other lesions of the small bowel. , 2009, Best practice & research. Clinical gastroenterology.

[49]  Berthold K. P. Horn,et al.  Closed-form solution of absolute orientation using unit quaternions , 1987 .

[50]  Daniel Mirota,et al.  A System for Video-Based Navigation for Endoscopic Endonasal Skull Base Surgery , 2012, IEEE Transactions on Medical Imaging.

[51]  Helder Araújo,et al.  A non-rigid map fusion-based direct SLAM method for endoscopic capsule robots , 2017, International Journal of Intelligent Robotics and Applications.

[52]  J. M. M. Montiel,et al.  Visual SLAM for Handheld Monocular Endoscope , 2014, IEEE Transactions on Medical Imaging.

[53]  Dimitris K. Iakovidis,et al.  KID Project: an internet-based digital video atlas of capsule endoscopy for research purposes , 2017, Endoscopy International Open.

[54]  Chunxiao Fan,et al.  Visual odometry based on convolutional neural networks for large-scale scenes , 2018, International Conference on Graphic and Image Processing.

[55]  P. Dario,et al.  Frontiers of robotic endoscopic capsules: a review , 2016, Journal of Micro-Bio Robotics.

[56]  Hann Woei Ho,et al.  Distance and velocity estimation using optical flow from a monocular camera , 2017 .

[57]  Nima Tajbakhsh,et al.  Automated Polyp Detection in Colonoscopy Videos Using Shape and Context Information , 2016, IEEE Transactions on Medical Imaging.

[58]  Faisal Mahmood,et al.  Unsupervised Reverse Domain Adaptation for Synthetic Medical Images via Adversarial Training , 2017, IEEE Transactions on Medical Imaging.

[59]  Fernando Vilariño,et al.  WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians , 2015, Comput. Medical Imaging Graph..

[60]  Zhongliang Deng,et al.  MagicVO: End-to-End Monocular Visual Odometry through Deep Bi-directional Recurrent Convolutional Neural Network , 2018, ArXiv.

[61]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[62]  Richard J. Chen,et al.  SLAM Endoscopy enhanced by adversarial depth prediction , 2019, ArXiv.

[63]  Guang-Zhong Yang,et al.  Online tracking and retargeting with applications to optical biopsy in gastrointestinal endoscopic examinations , 2016, Medical Image Anal..

[64]  Russell H. Taylor,et al.  Self-supervised Learning for Dense Depth Estimation in Monocular Endoscopy , 2018, OR 2.0/CARE/CLIP/ISIC@MICCAI.