Semantic-SuPer: A Semantic-aware Surgical Perception Framework for Endoscopic Tissue Classification, Reconstruction, and Tracking

Accurate and robust tracking and reconstruction of the surgical scene is a critical enabling technology toward autonomous robotic surgery. Existing algorithms for 3D perception in surgery mainly rely on geometric information, while we propose to also leverage semantic information inferred from the endoscopic video using image segmentation algorithms. In this paper, we present a novel, comprehensive surgical perception framework, Semantic-SuPer, that integrates geometric and semantic information to facilitate data association, 3D reconstruction, and tracking of endoscopic scenes, benefiting downstream tasks like surgical navigation. The proposed framework is demonstrated on challenging endoscopic data with deforming tissue, showing its advantages over our baseline and several other state-of the-art approaches. Our code and dataset are available at https://github.com/ucsdarclab/Python-SuPer.

[1]  Qingxu Dou,et al.  Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in Robotic Surgery , 2022, MICCAI.

[2]  Y. Iwahori,et al.  Semantic SLAM Based on Deep Learning in Endocavity Environment , 2022, Symmetry.

[3]  Ryan K. Orosco,et al.  Robotic Tool Tracking Under Partially Visible Kinematic Chain: A Unified Approach , 2021, IEEE Transactions on Robotics.

[4]  Muhammad Shafique,et al.  Efficient Uncertainty Estimation in Semantic Segmentation via Distillation , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[5]  Ryan K. Orosco,et al.  From Bench to Bedside: The First Live Robotic Surgery on the dVRK to Enable Remote Telesurgery with Motion Scaling , 2021, 2021 International Symposium on Medical Robotics (ISMR).

[6]  Cristian Sminchisescu,et al.  A Real-Time Online Learning Framework for Joint 3D Reconstruction and Semantic Segmentation of Indoor Scenes , 2021, IEEE Robotics and Automation Letters.

[7]  M. Zollhöfer,et al.  Pulsar: Efficient Sphere-based Neural Rendering , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Javier Civera,et al.  Endo-Depth-and-Motion: Reconstruction and Tracking in Endoscopic Videos Using Depth Networks and Photometric Constraints , 2021, IEEE Robotics and Automation Letters.

[9]  Blake Hannaford,et al.  Multi-Frame Feature Aggregation for Real-Time Instrument Segmentation in Endoscopic Video , 2020, IEEE Robotics and Automation Letters.

[10]  Juan D. Tard'os,et al.  SD-DefSLAM: Semi-Direct Monocular SLAM for Deformable and Intracorporeal Scenes , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Florian Richter,et al.  SuPer Deep: A Surgical Perception Framework for Robotic Tissue Manipulation using Deep Learning for Feature Extraction , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Adrien Bartoli,et al.  DefSLAM: Tracking and Mapping of Deforming Scenes From Monocular Sequences , 2019, IEEE Transactions on Robotics.

[13]  Michael C. Yip,et al.  A 2D Surgical Simulation Framework for Tool-Tissue Interaction , 2020, ArXiv.

[14]  Stefano Mattoccia,et al.  On the Uncertainty of Self-Supervised Monocular Depth Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Guang-Zhong Yang,et al.  Pathological Airway Segmentation with Cascaded Neural Networks for Bronchoscopic Navigation , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Shengjie Zhu,et al.  The Edge of Depth: Explicit Constraints Between Segmentation and Depth , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Matthias Nießner,et al.  Learning to Optimize Non-Rigid Tracking , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Blake Hannaford,et al.  Towards Better Surgical Instrument Segmentation in Endoscopic Vision: Multi-Angle Feature Aggregation and Contour Supervision , 2020, IEEE Robotics and Automation Letters.

[19]  David Baxter,et al.  Probabilistic Data Association via Mixture Models for Robust Semantic SLAM , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Michael C. Yip,et al.  SuPer: A Surgical Perception Framework for Endoscopic Tissue Manipulation With Surgical Robotics , 2019, IEEE Robotics and Automation Letters.

[21]  O. J. Elle,et al.  The effect of intraoperative imaging on surgical navigation for laparoscopic liver resection surgery , 2019, Scientific Reports.

[22]  Cyrill Stachniss,et al.  SuMa++: Efficient LiDAR-based Semantic SLAM , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Hongliang Ren,et al.  Learning Where to Look While Tracking Instruments in Robot-assisted Surgery , 2019, MICCAI.

[24]  Thierry Peynot,et al.  Dense-ArthroSLAM: Dense Intra-Articular 3-D Reconstruction With Robust Localization Prior for Arthroscopy , 2019, IEEE Robotics and Automation Letters.

[25]  Lena Maier-Hein,et al.  2017 Robotic Instrument Segmentation Challenge , 2019, ArXiv.

[26]  Juan Song,et al.  Semantic SLAM Based on Object Detection and Improved Octomap , 2018, IEEE Access.

[27]  Min Sun,et al.  Efficient Uncertainty Estimation for Semantic Segmentation in Videos , 2018, ECCV.

[28]  Nima Tajbakhsh,et al.  UNet++: A Nested U-Net Architecture for Medical Image Segmentation , 2018, DLMIA/ML-CDS@MICCAI.

[29]  Wei Gao,et al.  SurfelWarp: Efficient Non-Volumetric Single View Dynamic Reconstruction , 2018, Robotics: Science and Systems.

[30]  Wei Xu,et al.  Unsupervised Learning of Geometry with Edge-aware Depth-Normal Consistency , 2017, AAAI.

[31]  Yong-Sheng Chen,et al.  Pyramid Stereo Matching Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Gamini Dissanayake,et al.  MIS-SLAM: Real-Time Large-Scale Dense Deformable SLAM System in Minimal Invasive Surgery Based on Heterogeneous Computing , 2018, IEEE Robotics and Automation Letters.

[33]  Alexander Rakhlin,et al.  Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning , 2018, bioRxiv.

[34]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[35]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[36]  Sébastien Ourselin,et al.  ToolNet: Holistically-nested real-time segmentation of robotic surgical tools , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[37]  Guang-Zhong Yang,et al.  Self-Supervised Siamese Learning on Stereo Image Pairs for Depth Estimation in Robotic Surgery , 2017, ArXiv.

[38]  Sean L. Bowman,et al.  Probabilistic data association for semantic SLAM , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  oseph,et al.  3D reconstruction of cystoscopy videos for comprehensive bladder records , 2017 .

[41]  Allan Hanbury,et al.  Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool , 2015, BMC Medical Imaging.

[42]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[44]  Jan Egger,et al.  Development of a surgical navigation system based on augmented reality using an optical see-through head-mounted display , 2015, J. Biomed. Informatics.

[45]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[46]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Peter Kazanzides,et al.  An open-source research kit for the da Vinci® Surgical System , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[49]  James M. Rehg,et al.  Joint Semantic Segmentation and 3D Reconstruction from Monocular Video , 2014, ECCV.

[50]  J. M. M. Montiel,et al.  Visual SLAM for Handheld Monocular Endoscope , 2014, IEEE Transactions on Medical Imaging.

[51]  Bo Yang,et al.  A Quasi-Spherical Triangle-Based Approach for Efficient 3-D Soft-Tissue Motion Tracking , 2013, IEEE/ASME Transactions on Mechatronics.

[52]  Tim Weyrich,et al.  Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion , 2013, 2013 International Conference on 3D Vision.

[53]  Nils-Claudius Gellrich,et al.  Design and development of a virtual anatomic atlas of the human skull for automatic segmentation in computer-assisted surgery, preoperative planning, and navigation , 2013, International Journal of Computer Assisted Radiology and Surgery.

[54]  Michael C. Yip,et al.  Tissue Tracking and Registration for Image-Guided Surgery , 2012, IEEE Transactions on Medical Imaging.

[55]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Makoto Hashizume,et al.  Augmented reality navigation system for laparoscopic splenectomy in children based on preoperative CT image using optical tracking device , 2012, Pediatric Surgery International.

[57]  Javier Civera,et al.  Towards semantic SLAM using a monocular camera , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[58]  Javier Civera,et al.  EKF monocular SLAM with relocalization for laparoscopic sequences , 2011, 2011 IEEE International Conference on Robotics and Automation.

[59]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[60]  Mark Pauly,et al.  Embedded deformation for shape manipulation , 2007, ACM Trans. Graph..

[61]  Marc Alexa,et al.  As-rigid-as-possible surface modeling , 2007, Symposium on Geometry Processing.

[62]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[63]  Kok-Lim Low Linear Least-Squares Optimization for Point-to-Plane ICP Surface Registration , 2004 .

[64]  Dominik Endres,et al.  A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.

[65]  Matthias Zwicker,et al.  Surfels: surface elements as rendering primitives , 2000, SIGGRAPH.

[66]  Yingchun Fan,et al.  Blitz-SLAM: A semantic SLAM in dynamic environments , 2022, Pattern Recognit..