Multi-Frame Feature Aggregation for Real-Time Instrument Segmentation in Endoscopic Video

Deep learning-based methods have achieved promising results on surgical instrument segmentation. However, the high computation cost may limit the application of deep models to time-sensitive tasks such as online surgical video analysis for robotic-assisted surgery. Moreover, current methods may still suffer from challenging conditions in surgical images such as various lighting conditions and the presence of blood. We propose a novel Multi-frame Feature Aggregation (MFFA) module to aggregate video frame features temporally and spatially in a recurrent mode. By distributing the computation load of deep feature extraction over sequential frames, we can use a lightweight encoder to reduce the computation costs at each time step. Moreover, public surgical videos usually are not labeled frame by frame, so we develop a method that can randomly synthesize a surgical frame sequence from a single labeled frame to assist network training. We demonstrate that our approach achieves superior performance to corresponding deeper segmentation models on two public surgery datasets.

[1]  Lena Maier-Hein,et al.  Heidelberg Colorectal Data Set for Surgical Data Science in the Sensor Operating Room , 2020, ArXiv.

[2]  Xiao-Liang Xie,et al.  Attention-Guided Lightweight Network for Real-Time Segmentation of Robotic Surgical Instruments , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Cewu Lu,et al.  InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Saeid Nahavandi,et al.  Surgical tool segmentation using a hybrid deep CNN-RNN auto encoder-decoder , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[6]  Sébastien Ourselin,et al.  ToolNet: Holistically-nested real-time segmentation of robotic surgical tools , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  B. Hannaford,et al.  LC-GAN: Image-to-image Translation Based on Generative Adversarial Network for Endoscopic Images , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Bernhard Schölkopf,et al.  Online Video Deblurring via Dynamic Temporal Blending Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Allan Hanbury,et al.  Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool , 2015, BMC Medical Imaging.

[10]  Lena Maier-Hein,et al.  2017 Robotic Instrument Segmentation Challenge , 2019, ArXiv.

[11]  Gang Liu,et al.  Bidirectional LSTM with attention mechanism and convolutional layer for text classification , 2019, Neurocomputing.

[12]  Yunchao Wei,et al.  Memory Aggregation Networks for Efficient Interactive Video Object Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Danail Stoyanov,et al.  Vision‐based and marker‐less surgical tool detection and tracking: a review of the literature , 2017, Medical Image Anal..

[14]  Yi Wu,et al.  Gesture recognition based on deep deformable 3D convolutional neural networks , 2020, Pattern Recognit..

[15]  Alexandru Telea,et al.  An Image Inpainting Technique Based on the Fast Marching Method , 2004, J. Graphics, GPU, & Game Tools.

[16]  Bin Zhao,et al.  HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Zhanxing Zhu,et al.  Spatio-temporal Graph Convolutional Neural Network: A Deep Learning Framework for Traffic Forecasting , 2017, IJCAI.

[18]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[19]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[20]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[21]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[22]  L. Maier-Hein,et al.  2018 Robotic Scene Segmentation Challenge , 2020, ArXiv.

[23]  Alexey Shvets,et al.  TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation , 2018, Computer-Aided Analysis of Gastrointestinal Videos.

[24]  Sebastian Bodenstedt,et al.  Using 3D Convolutional Neural Networks to Learn Spatiotemporal Features for Automatic Surgical Gesture Recognition in Video , 2019, MICCAI.

[25]  Bastian Leibe,et al.  FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Stefanie Speidel,et al.  Video-based surgical skill assessment using 3D convolutional neural networks , 2019, International Journal of Computer Assisted Radiology and Surgery.

[27]  Hongliang Ren,et al.  Learning Where to Look While Tracking Instruments in Robot-assisted Surgery , 2019, MICCAI.

[28]  Eugenio Culurciello,et al.  LinkNet: Exploiting encoder representations for efficient semantic segmentation , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[29]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[30]  H. Cabral,et al.  Multiple Comparisons Procedures , 2008, Circulation.

[31]  Pheng-Ann Heng,et al.  Incorporating Temporal Prior from Motion Flow for Instrument Segmentation in Minimally Invasive Surgery Video , 2019, MICCAI.

[32]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[33]  Klaus H. Maier-Hein,et al.  Comparative validation of multi-instance instrument segmentation in endoscopy: Results of the ROBUST-MIS 2019 challenge , 2020, Medical Image Anal..

[34]  Kyoung Mu Lee,et al.  Recurrent Neural Networks With Intra-Frame Iterations for Video Deblurring , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Bin Zhao,et al.  CAM-RNN: Co-Attention Model Based RNN for Video Captioning , 2019, IEEE Transactions on Image Processing.

[38]  Alexander Rakhlin,et al.  Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning , 2018, bioRxiv.

[39]  Blake Hannaford,et al.  Towards Better Surgical Instrument Segmentation in Endoscopic Vision: Multi-Angle Feature Aggregation and Contour Supervision , 2020, IEEE Robotics and Automation Letters.

[40]  Rui Yao,et al.  CANet: Class-Agnostic Segmentation Networks With Iterative Refinement and Attentive Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Jianxin Chen,et al.  CNN-G: Convolutional Neural Network Combined With Graph for Image Segmentation With Theoretical Analysis , 2021, IEEE Transactions on Cognitive and Developmental Systems.

[42]  Elena De Momi,et al.  Towards realistic laparoscopic image generation using image-domain translation , 2020, Comput. Methods Programs Biomed..

[43]  Zhe L. Lin,et al.  Temporally Distributed Networks for Fast Video Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Yili Fu,et al.  Lightweight Deep Neural Network for Real-Time Instrument Semantic Segmentation in Robot Assisted Minimally Invasive Surgery , 2021, IEEE Robotics and Automation Letters.

[46]  Peter M. Full,et al.  Robust Medical Instrument Segmentation Challenge 2019 , 2020, ArXiv.