Exploiting the potential of unlabeled endoscopic video data with self-supervised learning

PurposeSurgical data science is a new research field that aims to observe all aspects of the patient treatment process in order to provide the right assistance at the right time. Due to the breakthrough successes of deep learning-based solutions for automatic image annotation, the availability of reference annotations for algorithm training is becoming a major bottleneck in the field. The purpose of this paper was to investigate the concept of self-supervised learning to address this issue.MethodsOur approach is guided by the hypothesis that unlabeled video data can be used to learn a representation of the target domain that boosts the performance of state-of-the-art machine learning algorithms when used for pre-training. Core of the method is an auxiliary task based on raw endoscopic video data of the target domain that is used to initialize the convolutional neural network (CNN) for the target task. In this paper, we propose the re-colorization of medical images with a conditional generative adversarial network (cGAN)-based architecture as auxiliary task. A variant of the method involves a second pre-training step based on labeled data for the target task from a related domain. We validate both variants using medical instrument segmentation as target task.ResultsThe proposed approach can be used to radically reduce the manual annotation effort involved in training CNNs. Compared to the baseline approach of generating annotated data from scratch, our method decreases exploratively the number of labeled images by up to 75% without sacrificing performance. Our method also outperforms alternative methods for CNN pre-training, such as pre-training on publicly available non-medical (COCO) or medical data (MICCAI EndoVis2017 challenge) using the target task (in this instance: segmentation).ConclusionAs it makes efficient use of available (non-)public and (un-)labeled data, the approach has the potential to become a valuable tool for CNN (pre-)training.

[1]  Lucas Theis,et al.  Amortised MAP Inference for Image Super-resolution , 2016, ICLR.

[2]  C. McCulloch,et al.  Generalized Linear Mixed Models , 2005 .

[3]  Nassir Navab,et al.  Semi-supervised Deep Learning for Fully Convolutional Networks , 2017, MICCAI.

[4]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[5]  Rüdiger Dillmann,et al.  Unsupervised temporal context learning using convolutional neural networks for laparoscopic workflow analysis , 2017, ArXiv.

[6]  Klaus H. Maier-Hein,et al.  Crowd-Algorithm Collaboration for Large-Scale Endoscopic Image Annotation with Confidence , 2016, MICCAI.

[7]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[8]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Russell H. Taylor,et al.  Surgical data science for next-generation interventions , 2017, Nature Biomedical Engineering.

[10]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[11]  Hariharan Ravishankar,et al.  Understanding the Mechanisms of Deep Transfer Learning for Medical Images , 2016, LABELS/DLMIA@MICCAI.

[12]  Lena Maier-Hein,et al.  Clickstream Analysis for Crowd-Based Object Segmentation with Confidence , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Nicolai Schoch,et al.  Surgical Data Science: Enabling Next-Generation Surgery , 2017, ArXiv.

[14]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[15]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  S. Speidel,et al.  How to Create the Largest In-Vivo Endoscopic Dataset , 2017 .

[17]  Lena Maier-Hein,et al.  Can Masses of Non-Experts Train Highly Accurate Image Classifiers? - A Crowdsourcing Approach to Instrument Segmentation in Laparoscopic Images , 2014, MICCAI.

[18]  Nassir Navab,et al.  Deep Residual Learning for Instrument Segmentation in Robotic Surgery , 2017, MLMI@MICCAI.

[19]  Gregory Shakhnarovich,et al.  Colorization as a Proxy Task for Visual Understanding , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Alexei A. Efros,et al.  Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[22]  Sébastien Ourselin,et al.  ToolNet: Holistically-nested real-time segmentation of robotic surgical tools , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Konstantinos Kamnitsas,et al.  Unsupervised domain adaptation in brain lesion segmentation with adversarial networks , 2016, IPMI.

[24]  Jitendra Malik,et al.  Learning to See by Moving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[26]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[27]  Lei Zhang,et al.  Fine-Tuning Convolutional Neural Networks for Biomedical Image Analysis: Actively and Incrementally , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[29]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[30]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yan Zhang,et al.  Tissue classification for laparoscopic image understanding based on multispectral texture analysis , 2016, SPIE Medical Imaging.

[33]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[34]  Nima Tajbakhsh,et al.  On the Necessity of Fine-Tuned Convolutional Neural Networks for Medical Imaging , 2017, Deep Learning and Convolutional Neural Networks for Medical Image Computing.

[35]  Nima Tajbakhsh,et al.  Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? , 2016, IEEE Transactions on Medical Imaging.

[36]  Sébastien Ourselin,et al.  Real-Time Segmentation of Non-rigid Surgical Tools Based on Deep Learning and Tracking , 2016, CARE@MICCAI.

[37]  Constantinos Loukas,et al.  Video content analysis of surgical procedures , 2018, Surgical Endoscopy.

[38]  Geoffrey E. Hinton,et al.  To recognize shapes, first learn to generate images. , 2007, Progress in brain research.