StreoScenNet: surgical stereo robotic scene segmentation

Surgical robot technology has revolutionized surgery toward a safer laparoscopic surgery and ideally been suited for surgeries requiring minimal invasiveness. Sematic segmentation from robot-assisted surgery videos is an essential task in many computer-assisted robotic surgical systems. Some of the applications include instrument detection, tracking and pose estimation. Usually, the left and right frames from the stereoscopic surgical instrument are used for semantic segmentation independently from each other. However, this approach is prone to poor segmentation since the stereo frames are not integrated for accurate estimation of the surgical scene. To cope with this problem, we proposed a multi encoder and single decoder convolutional neural network named StreoScenNet which exploits the left and right frames of the stereoscopic surgical system. The proposed architecture consists of multiple ResNet encoder blocks and a stacked convolutional decoder network connected with a novel sum-skip connection. The input to the network is a set of left and right frames and the output is a mask of the segmented regions for the left frame. It is trained end-to-end and the segmentation is achieved without the need of any pre- or post-processing. We compare the proposed architectures against state-of-the-art fully convolutional networks. We validate our methods using existing benchmark datasets that includes robotic instruments as well as anatomical objects and non-robotic surgical instruments. Compared with the previous instrument segmentation methods, our approach achieves a significant improved Dice similarity coefficient.

[1]  Jason J. Corso,et al.  Product of tracking experts for visual tracking of surgical tools , 2013, 2013 IEEE International Conference on Automation Science and Engineering (CASE).

[2]  Sébastien Ourselin,et al.  ToolNet: Holistically-nested real-time segmentation of robotic surgical tools , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  V. Ozben,et al.  Robotic-Assisted Minimally Invasive Surgery , 2019, Springer International Publishing.

[4]  Eugenio Culurciello,et al.  LinkNet: Exploiting encoder representations for efficient semantic segmentation , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[5]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  D. Stoyanov,et al.  3-D Pose Estimation of Articulated Instruments in Robotic Minimally Invasive Surgery , 2018, IEEE Transactions on Medical Imaging.

[7]  Congcong Wang,et al.  Stereo video analysis for instrument tracking in image-guided surgery , 2014, 2014 5th European Workshop on Visual Information Processing (EUVIP).

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Marius Pedersen,et al.  Y-Net: A deep Convolutional Neural Network for Polyp Detection , 2018, BMVC.

[10]  J. M. Sackier,et al.  Robotically assisted laparoscopic surgery , 2008, Surgical Endoscopy.

[11]  Michael E. Moran,et al.  Robotic instrumentation: Evolution and microsurgical applications , 2010, Indian journal of urology : IJU : journal of the Urological Society of India.

[12]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[13]  Alexander Rakhlin,et al.  Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning , 2018, bioRxiv.

[14]  Sébastien Ourselin,et al.  Real-Time Segmentation of Non-rigid Surgical Tools Based on Deep Learning and Tracking , 2016, CARE@MICCAI.

[15]  Shuang Song,et al.  A Six-Dimensional Magnetic Localization Algorithm for a Rectangular Magnet Objective Based on a Particle Swarm Optimizer , 2009, IEEE Transactions on Magnetics.

[16]  J. Dankelman,et al.  Haptics in minimally invasive surgery – a review , 2008, Minimally invasive therapy & allied technologies : MITAT : official journal of the Society for Minimally Invasive Therapy.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).