Anchor-guided online meta adaptation for fast one-Shot instrument segmentation from robotic surgical videos

The scarcity of annotated surgical data in robot-assisted surgery (RAS) motivates prior works to borrow related domain knowledge to achieve promising segmentation results in surgical images by adaptation. For dense instrument tracking in a robotic surgical video, collecting one initial scene to specify target instruments (or parts of tools) is desirable and feasible during the preoperative preparation. In this paper, we study the challenging one-shot instrument segmentation for robotic surgical videos, in which only the first frame mask of each video is provided at test time, such that the pre-trained model (learned from easily accessible source) can adapt to the target instruments. Straightforward methods transfer the domain knowledge by fine-tuning the model on each given mask. Such one-shot optimization takes hundred of iterations and the test runtime is unfeasible. We present anchor-guided online meta adaptation (AOMA) for this problem. We achieve fast one-shot test time optimization by meta-learning a good model initialization and learning rates from source videos to avoid the laborious and handcrafted fine-tuning. The trainable two components are optimized in a video-specific task space with a matching-aware loss. Furthermore, we design an anchor-guided online adaptation to tackle the performance drop throughout a robotic surgical sequence. The model is continuously adapted on motion-insensitive pseudo-masks supported by anchor matching. AOMA achieves state-of-the-art results on two practical scenarios: (1) general videos to surgical videos, (2) public surgical videos to in-house surgical videos, while reducing the test runtime substantially.

[1]  Nicolas Padoy,et al.  Self-Supervised Surgical Tool Segmentation using Kinematic Information , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[2]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[3]  Hongliang Ren,et al.  Real-Time Instrument Segmentation in Robotic Surgery Using Auxiliary Supervised Deep Adversarial Learning , 2019, IEEE Robotics and Automation Letters.

[4]  Anirban Mukhopadhyay,et al.  Endo-Sim2Real: Consistency learning-based domain adaptation for instrument segmentation , 2020, MICCAI.

[5]  Amos J. Storkey,et al.  How to train your MAML , 2018, ICLR.

[6]  Ishwar K. Sethi,et al.  Morphological Filters: An Inspiration from Natural Geometrical Erosion and Dilation , 2017 .

[7]  Zeng-Guang Hou,et al.  RASNet: Segmentation for Tracking Surgical Instruments in Surgical Videos Using Refined Attention Segmentation Network , 2019, 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Mubarak Shah,et al.  Task Agnostic Meta-Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Pheng-Ann Heng,et al.  Future Frame Prediction for Robot-assisted Surgery , 2021, IPMI.

[13]  Lena Maier-Hein,et al.  Generating large labeled data sets for laparoscopic image processing tasks using unpaired image-to-image translation , 2019, MICCAI.

[14]  Mei Tian,et al.  DeU-Net: Deformable U-Net for 3D Cardiac MRI Video Segmentation , 2020, MICCAI.

[15]  Alexander Rakhlin,et al.  Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning , 2018, bioRxiv.

[16]  Daniel C. Castro,et al.  Domain Generalization via Model-Agnostic Learning of Semantic Features , 2019, NeurIPS.

[17]  Alexander G. Schwing,et al.  VideoMatch: Matching based Video Object Segmentation , 2018, ECCV.

[18]  Yu Liu,et al.  Online Meta Adaptation for Fast Video Object Segmentation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Pheng-Ann Heng,et al.  Incorporating Temporal Prior from Motion Flow for Instrument Segmentation in Minimally Invasive Surgery Video , 2019, MICCAI.

[20]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[21]  Pheng-Ann Heng,et al.  One to Many: Adaptive Instrument Segmentation via Meta Learning and Dynamic Online Adaptation in Robotic Surgical Video , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Daochang Liu,et al.  Unsupervised Surgical Instrument Segmentation via Anchor Generation and Semantic Diffusion , 2020, MICCAI.

[23]  Pheng-Ann Heng,et al.  Learning Motion Flows for Semi-supervised Instrument Segmentation from Robotic Surgical Video , 2020, MICCAI.

[24]  Ling Shao,et al.  Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Gang Wang,et al.  Motion-Guided Cascaded Refinement Network for Video Object Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Nassir Navab,et al.  CFCM: Segmentation via Coarse to Fine Context Memory , 2018, MICCAI.

[27]  Xiaoyan Yu,et al.  A Holistically-Nested U-Net: Surgical Instrument Segmentation Based on Convolutional Neural Network , 2019, Journal of Digital Imaging.

[28]  Bernt Schiele,et al.  Meta-Transfer Learning for Few-Shot Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Xiao Chen,et al.  FOAL: Fast Online Adaptive Learning for Cardiac Motion Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jinjun Xiong,et al.  MSU-Net: Multiscale Statistical U-Net for Real-time 3D Cardiac MRI Video Segmentation , 2019, MICCAI.

[31]  Luigi di Stefano,et al.  Learning to Adapt for Stereo , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Bastian Leibe,et al.  Online Adaptation of Convolutional Neural Networks for Video Object Segmentation , 2017, BMVC.

[33]  Nicolai Schoch,et al.  Surgical Data Science: Enabling Next-Generation Surgery , 2017, ArXiv.

[34]  Hongliang Ren,et al.  ST-MTL: Spatio-Temporal multitask learning model to predict scanpath while tracking instruments in robotic surgery , 2020, Medical Image Anal..

[35]  Jian Yang,et al.  Online Adaptation through Meta-Learning for Stereo Depth Estimation , 2019, ArXiv.

[36]  Danail Stoyanov,et al.  Synthetic and Real Inputs for Tool Segmentation in Robotic Surgery , 2020, MICCAI.

[37]  D. Stoyanov,et al.  3-D Pose Estimation of Articulated Instruments in Robotic Minimally Invasive Surgery , 2018, IEEE Transactions on Medical Imaging.

[38]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Klaus H. Maier-Hein,et al.  Exploiting the potential of unlabeled endoscopic video data with self-supervised learning , 2017, International Journal of Computer Assisted Radiology and Surgery.

[40]  Ling Shao,et al.  RANet: Ranking Attention Network for Fast Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).