LRTD: long-range temporal dependency based active learning for surgical workflow recognition

Automatic surgical workflow recognition in video is an essentially fundamental yet challenging problem for developing computer-assisted and robotic-assisted surgery. Existing approaches with deep learning have achieved remarkable performance on analysis of surgical videos, however, heavily relying on large-scale labelled datasets. Unfortunately, the annotation is not often available in abundance, because it requires the domain knowledge of surgeons. Even for experts, it is very tedious and time-consuming to do a sufficient amount of annotations. In this paper, we propose a novel active learning method for cost-effective surgical video analysis. Specifically, we propose a non-local recurrent convolutional network, which introduces non-local block to capture the long-range temporal dependency (LRTD) among continuous frames. We then formulate an intra-clip dependency score to represent the overall dependency within this clip. By ranking scores among clips in unlabelled data pool, we select the clips with weak dependencies to annotate, which indicates the most informative ones to better benefit network training. We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task. By using our LRTD based selection strategy, we can outperform other state-of-the-art active learning methods who only consider neighbor-frame information. Using only up to 50% of samples, our approach can exceed the performance of full-data training. By modeling the intra-clip dependency, our LRTD based strategy shows stronger capability to select informative video clips for annotation compared with other active learning methods, through the evaluation on a popular public surgical dataset. The results also show the promising potential of our framework for reducing annotation workload in the clinical practice.

[1]  Danny Z. Chen,et al.  Biomedical Image Segmentation via Representative Annotation , 2019, AAAI.

[2]  Germain Forestier,et al.  Automatic phase prediction from low-level surgical activities , 2015, International Journal of Computer Assisted Radiology and Surgery.

[3]  Gwénolé Quellec,et al.  Real-time recognition of surgical tasks in eye surgery videos , 2014, Medical Image Anal..

[4]  Andrew Zisserman,et al.  Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Pierre Jannin,et al.  Automatic data-driven real-time segmentation and recognition of surgical workflow , 2016, International Journal of Computer Assisted Radiology and Surgery.

[6]  Danail Stoyanov,et al.  Vision‐based and marker‐less surgical tool detection and tracking: a review of the literature , 2017, Medical Image Anal..

[7]  Gregory D. Hager,et al.  Surgical gesture classification from video and kinematic data , 2013, Medical Image Anal..

[8]  Rüdiger Dillmann,et al.  Unsupervised temporal context learning using convolutional neural networks for laparoscopic workflow analysis , 2017, ArXiv.

[9]  Nathalie Bricon-Souf,et al.  Context awareness in health care: A review , 2007, Int. J. Medical Informatics.

[10]  Jean-Philippe Thiran,et al.  Efficient Active Learning for Image Classification and Segmentation using a Sample Selection and Conditional Generative Adversarial Network , 2018, MICCAI.

[11]  Lei Zhang,et al.  Fine-Tuning Convolutional Neural Networks for Biomedical Image Analysis: Actively and Incrementally , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[13]  Gaurav Yengera,et al.  Less is More: Surgical Phase Recognition with Less Annotations through Self-Supervised Pre-training of CNN-LSTM Networks , 2018, ArXiv.

[14]  Hao Chen,et al.  An Active Learning Approach for Reducing Annotation Cost in Skin Lesion Analysis , 2019, MLMI@MICCAI.

[15]  Klaus H. Maier-Hein,et al.  Exploiting the potential of unlabeled endoscopic video data with self-supervised learning , 2017, International Journal of Computer Assisted Radiology and Surgery.

[16]  Chi-Wing Fu,et al.  SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network , 2018, IEEE Transactions on Medical Imaging.

[17]  Hao Chen,et al.  Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis , 2019, Medical Image Anal..

[18]  Bernt Schiele,et al.  Detecting Surgical Tools by Modelling Local Appearance and Global Shape , 2015, IEEE Transactions on Medical Imaging.

[19]  Martin Wagner,et al.  Active learning using deep Bayesian networks for surgical workflow analysis , 2018, International Journal of Computer Assisted Radiology and Surgery.

[20]  Sebastian Bodenstedt,et al.  Temporal coherence-based self-supervised learning for laparoscopic workflow analysis , 2018, OR 2.0/CARE/CLIP/ISIC@MICCAI.

[21]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Guang-Zhong Yang,et al.  Eye-Gaze Driven Surgical Workflow Segmentation , 2007, MICCAI.

[23]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[24]  Didier Mutter,et al.  Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition , 2018, ArXiv.

[25]  Gregory D. Hager,et al.  A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery , 2017, IEEE Transactions on Biomedical Engineering.

[26]  Lin Yang,et al.  Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation , 2017, MICCAI.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Kevin Cleary,et al.  OR 2020: the operating room of the future. , 2004, Journal of laparoendoscopic & advanced surgical techniques. Part A.