论文信息 - Multi-Frame Feature Aggregation for Real-Time Instrument Segmentation in Endoscopic Video

Multi-Frame Feature Aggregation for Real-Time Instrument Segmentation in Endoscopic Video

Deep learning-based methods have achieved promising results on surgical instrument segmentation. However, the high computation cost may limit the application of deep models to time-sensitive tasks such as online surgical video analysis for robotic-assisted surgery. Moreover, current methods may still suffer from challenging conditions in surgical images such as various lighting conditions and the presence of blood. We propose a novel Multi-frame Feature Aggregation (MFFA) module to aggregate video frame features temporally and spatially in a recurrent mode. By distributing the computation load of deep feature extraction over sequential frames, we can use a lightweight encoder to reduce the computation costs at each time step. Moreover, public surgical videos usually are not labeled frame by frame, so we develop a method that can randomly synthesize a surgical frame sequence from a single labeled frame to assist network training. We demonstrate that our approach achieves superior performance to corresponding deeper segmentation models on two public surgery datasets.

[1] Lena Maier-Hein,et al. Heidelberg Colorectal Data Set for Surgical Data Science in the Sensor Operating Room , 2020, ArXiv.

[2] Xiao-Liang Xie,et al. Attention-Guided Lightweight Network for Real-Time Segmentation of Robotic Surgical Instruments , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[3] Cewu Lu,et al. InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5] Saeid Nahavandi,et al. Surgical tool segmentation using a hybrid deep CNN-RNN auto encoder-decoder , 2017, 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[6] Sébastien Ourselin,et al. ToolNet: Holistically-nested real-time segmentation of robotic surgical tools , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7] B. Hannaford,et al. LC-GAN: Image-to-image Translation Based on Generative Adversarial Network for Endoscopic Images , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8] Bernhard Schölkopf,et al. Online Video Deblurring via Dynamic Temporal Blending Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9] Allan Hanbury,et al. Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool , 2015, BMC Medical Imaging.

[10] Lena Maier-Hein,et al. 2017 Robotic Instrument Segmentation Challenge , 2019, ArXiv.

[11] Gang Liu,et al. Bidirectional LSTM with attention mechanism and convolutional layer for text classification , 2019, Neurocomputing.

[12] Yunchao Wei,et al. Memory Aggregation Networks for Efficient Interactive Video Object Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Danail Stoyanov,et al. Vision‐based and marker‐less surgical tool detection and tracking: a review of the literature , 2017, Medical Image Anal..

[14] Yi Wu,et al. Gesture recognition based on deep deformable 3D convolutional neural networks , 2020, Pattern Recognit..

[15] Alexandru Telea,et al. An Image Inpainting Technique Based on the Fast Marching Method , 2004, J. Graphics, GPU, & Game Tools.

[16] Bin Zhao,et al. HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Zhanxing Zhu,et al. Spatio-temporal Graph Convolutional Neural Network: A Deep Learning Framework for Traffic Forecasting , 2017, IJCAI.

[18] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[19] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[20] Zachary Chase Lipton. A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[21] M. Friedman. The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[22] L. Maier-Hein,et al. 2018 Robotic Scene Segmentation Challenge , 2020, ArXiv.

[23] Alexey Shvets,et al. TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation , 2018, Computer-Aided Analysis of Gastrointestinal Videos.

[24] Sebastian Bodenstedt,et al. Using 3D Convolutional Neural Networks to Learn Spatiotemporal Features for Automatic Surgical Gesture Recognition in Video , 2019, MICCAI.

[25] Bastian Leibe,et al. FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Stefanie Speidel,et al. Video-based surgical skill assessment using 3D convolutional neural networks , 2019, International Journal of Computer Assisted Radiology and Surgery.

[27] Hongliang Ren,et al. Learning Where to Look While Tracking Instruments in Robot-assisted Surgery , 2019, MICCAI.

[28] Eugenio Culurciello,et al. LinkNet: Exploiting encoder representations for efficient semantic segmentation , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[29] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[30] H. Cabral,et al. Multiple Comparisons Procedures , 2008, Circulation.

[31] Pheng-Ann Heng,et al. Incorporating Temporal Prior from Motion Flow for Instrument Segmentation in Minimally Invasive Surgery Video , 2019, MICCAI.

[32] Zhen-Hua Ling,et al. Enhanced LSTM for Natural Language Inference , 2016, ACL.

[33] Klaus H. Maier-Hein,et al. Comparative validation of multi-instance instrument segmentation in endoscopy: Results of the ROBUST-MIS 2019 challenge , 2020, Medical Image Anal..

[34] Kyoung Mu Lee,et al. Recurrent Neural Networks With Intra-Frame Iterations for Video Deblurring , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] Bin Zhao,et al. CAM-RNN: Co-Attention Model Based RNN for Video Captioning , 2019, IEEE Transactions on Image Processing.

[38] Alexander Rakhlin,et al. Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning , 2018, bioRxiv.

[39] Blake Hannaford,et al. Towards Better Surgical Instrument Segmentation in Endoscopic Vision: Multi-Angle Feature Aggregation and Contour Supervision , 2020, IEEE Robotics and Automation Letters.

[40] Rui Yao,et al. CANet: Class-Agnostic Segmentation Networks With Iterative Refinement and Attentive Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Jianxin Chen,et al. CNN-G: Convolutional Neural Network Combined With Graph for Image Segmentation With Theoretical Analysis , 2021, IEEE Transactions on Cognitive and Developmental Systems.

[42] Elena De Momi,et al. Towards realistic laparoscopic image generation using image-domain translation , 2020, Comput. Methods Programs Biomed..

[43] Zhe L. Lin,et al. Temporally Distributed Networks for Fast Video Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Jun Fu,et al. Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Yili Fu,et al. Lightweight Deep Neural Network for Real-Time Instrument Semantic Segmentation in Robot Assisted Minimally Invasive Surgery , 2021, IEEE Robotics and Automation Letters.

[46] Peter M. Full,et al. Robust Medical Instrument Segmentation Challenge 2019 , 2020, ArXiv.