Incorporating Temporal Prior from Motion Flow for Instrument Segmentation in Minimally Invasive Surgery Video

Automatic instrument segmentation in video is an essentially fundamental yet challenging problem for robot-assisted minimally invasive surgery. In this paper, we propose a novel framework to leverage instrument motion information, by incorporating a derived temporal prior to an attention pyramid network for accurate segmentation. Our inferred prior can provide reliable indication of the instrument location and shape, which is propagated from the previous frame to the current frame according to inter-frame motion flow. This prior is injected to the middle of an encoder-decoder segmentation network as an initialization of a pyramid of attention modules, to explicitly guide segmentation output from coarse to fine. In this way, the temporal dynamics and the attention network can effectively complement and benefit each other. As additional usage, our temporal prior enables semi-supervised learning with periodically unlabeled video frames, simply by reverse execution. We extensively validate our method on the public 2017 MICCAI EndoVis Robotic Instrument Segmentation Challenge dataset with three different tasks. Our method consistently exceeds the state-of-the-art results across all three tasks by a large margin. Our semi-supervised variant also demonstrates a promising potential for reducing annotation cost in the clinical practice.

[1]  D. Stoyanov,et al.  3-D Pose Estimation of Articulated Instruments in Robotic Minimally Invasive Surgery , 2018, IEEE Transactions on Medical Imaging.

[2]  Sébastien Ourselin,et al.  ToolNet: Holistically-nested real-time segmentation of robotic surgical tools , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Lena Maier-Hein,et al.  2017 Robotic Instrument Segmentation Challenge , 2019, ArXiv.

[4]  S. M. Kamrul Hasan,et al.  U-NetPlus: A Modified Encoder-Decoder U-Net Architecture for Semantic and Instance Segmentation of Surgical Instruments from Laparoscopic Images , 2019, 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Chi-Wing Fu,et al.  SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network , 2018, IEEE Transactions on Medical Imaging.

[7]  Nassir Navab,et al.  Real-time localization of articulated surgical instruments in retinal microsurgery , 2016, Medical Image Anal..

[8]  Loïc Le Folgoc,et al.  Attention U-Net: Learning Where to Look for the Pancreas , 2018, ArXiv.

[9]  Jason J. Corso,et al.  Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection , 2017, IEEE Transactions on Medical Imaging.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[12]  Bernt Schiele,et al.  Detecting Surgical Tools by Modelling Local Appearance and Global Shape , 2015, IEEE Transactions on Medical Imaging.

[13]  Alexander Rakhlin,et al.  Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning , 2018 .

[14]  Nassir Navab,et al.  Concurrent Segmentation and Localization for Tracking of Surgical Instruments , 2017, MICCAI.

[15]  Guang Yang,et al.  Multiview Two-Task Recursive Attention Model for Left Atrium and Atrial Scars Segmentation , 2018, MICCAI.

[16]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[17]  Alexander Rakhlin,et al.  Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning , 2018, bioRxiv.

[18]  Stefan Roth,et al.  UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss , 2017, AAAI.

[19]  Nassir Navab,et al.  CFCM: Segmentation via Coarse to Fine Context Memory , 2018, MICCAI.