Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis

Surgical tool presence detection and surgical phase recognition are two fundamental yet challenging tasks in surgical video analysis as well as very essential components in various applications in modern operating rooms. While these two analysis tasks are highly correlated in clinical practice as the surgical process is typically well-defined, most previous methods tackled them separately, without making full use of their relatedness. In this paper, we present a novel method by developing a multi-task recurrent convolutional network with correlation loss (MTRCNet-CL) to exploit their relatedness to simultaneously boost the performance of both tasks. Specifically, our proposed MTRCNet-CL model has an end-to-end architecture with two branches, which share earlier feature encoders to extract general visual features while holding respective higher layers targeting for specific tasks. Given that temporal information is crucial for phase recognition, long-short term memory (LSTM) is explored to model the sequential dependencies in the phase recognition branch. More importantly, a novel and effective correlation loss is designed to model the relatedness between tool presence and phase identification of each video frame, by minimizing the divergence of predictions from the two branches. Mutually leveraging both low-level feature sharing and high-level prediction correlating, our MTRCNet-CL method can encourage the interactions between the two tasks to a large extent, and hence can bring about benefits to each other. Extensive experiments on a large surgical video dataset (Cholec80) demonstrate outstanding performance of our proposed method, consistently exceeding the state-of-the-art methods by a large margin, e.g., 89.1% v.s. 81.0% for the mAP in tool presence detection and 87.4% v.s. 84.5% for F1 score in phase recognition.

[1]  Junzhou Huang,et al.  Graph Convolutional Nets for Tool Presence Detection in Surgical Videos , 2019, IPMI.

[2]  Nathalie Bricon-Souf,et al.  Context awareness in health care: A review , 2007, Int. J. Medical Informatics.

[3]  Nassir Navab,et al.  Modeling and Segmentation of Surgical Workflow from Laparoscopic Video , 2010, MICCAI.

[4]  Gwénolé Quellec,et al.  Real-Time Task Recognition in Cataract Surgery Videos Using Adaptive Spatiotemporal Polynomials , 2015, IEEE Transactions on Medical Imaging.

[5]  Philip Bachman,et al.  Learning with Pseudo-Ensembles , 2014, NIPS.

[6]  Nassir Navab,et al.  Automatic feature generation in endoscopic images , 2008, International Journal of Computer Assisted Radiology and Surgery.

[7]  Amit K. Roy-Chowdhury,et al.  Joint Prediction of Activity Labels and Starting Times in Untrimmed Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Gregory D. Hager,et al.  Assessment of Automated Identification of Phases in Videos of Cataract Surgery Using Machine Learning and Deep Learning Techniques , 2019, JAMA network open.

[9]  Gregory D. Hager,et al.  Surgical gesture classification from video and kinematic data , 2013, Medical Image Anal..

[10]  Ronald M. Summers,et al.  Spatial aggregation of holistically‐nested convolutional neural networks for automated pancreas localization and segmentation☆ , 2017, Medical Image Anal..

[11]  Guang-Zhong Yang,et al.  Eye-Gaze Driven Surgical Workflow Segmentation , 2007, MICCAI.

[12]  Pierre Jannin,et al.  A Framework for the Recognition of High-Level Surgical Tasks From Video Images for Cataract Surgeries , 2012, IEEE Transactions on Biomedical Engineering.

[13]  Daniel Wesierski,et al.  Instrument detection and pose estimation with rigid part mixtures model in video‐assisted surgeries , 2018, Medical Image Anal..

[14]  Hao Chen,et al.  SFCN-OPI: Detection and Fine-grained Classification of Nuclei Using Sibling FCN with Objectness Prior Interaction , 2017, AAAI.

[15]  Germain Forestier,et al.  Automatic phase prediction from low-level surgical activities , 2015, International Journal of Computer Assisted Radiology and Surgery.

[16]  Gwénolé Quellec,et al.  Real-time recognition of surgical tasks in eye surgery videos , 2014, Medical Image Anal..

[17]  Isabelle Augenstein,et al.  Multi-Task Learning of Pairwise Sequence Classification Tasks over Disparate Label Spaces , 2018, NAACL.

[18]  Bernt Schiele,et al.  Detecting Surgical Tools by Modelling Local Appearance and Global Shape , 2015, IEEE Transactions on Medical Imaging.

[19]  Matthieu Cord,et al.  M2CAI Workflow Challenge: Convolutional Neural Networks with Time Smoothing and Hidden Markov Model for Video Frames Classification , 2016, ArXiv.

[20]  Nassir Navab,et al.  Statistical modeling and recognition of surgical workflow , 2012, Medical Image Anal..

[21]  Peter Fu-Ming Hu,et al.  Real-Time Identification of Operating Room State from Video , 2007, AAAI.

[22]  Gwénolé Quellec,et al.  Surgical tool detection in cataract surgery videos through multi-image fusion inside a convolutional neural network , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[23]  Chi-Wing Fu,et al.  SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network , 2018, IEEE Transactions on Medical Imaging.

[24]  Pierre Jannin,et al.  Automatic data-driven real-time segmentation and recognition of surgical workflow , 2016, International Journal of Computer Assisted Radiology and Surgery.

[25]  Danail Stoyanov,et al.  Vision‐based and marker‐less surgical tool detection and tracking: a review of the literature , 2017, Medical Image Anal..

[26]  Gregory D. Hager,et al.  Recognizing Surgical Activities with Recurrent Neural Networks , 2016, MICCAI.

[27]  Nassir Navab,et al.  Concurrent Segmentation and Localization for Tracking of Surgical Instruments , 2017, MICCAI.

[28]  Pierre Jannin,et al.  Surgical process modelling: a review , 2014, International Journal of Computer Assisted Radiology and Surgery.

[29]  Nassir Navab,et al.  On-line Recognition of Surgical Activity for Monitoring in the Operating Room , 2008, AAAI.

[30]  Gaurav Yengera,et al.  Less is More: Surgical Phase Recognition with Less Annotations through Self-Supervised Pre-training of CNN-LSTM Networks , 2018, ArXiv.

[31]  Wufeng Xue,et al.  Full Quantification of Left Ventricle via Deep Multitask Learning Network Respecting Intra- and Inter-Task Relatedness , 2017, MICCAI.

[32]  Timnit Gebru,et al.  Fine-Grained Recognition in the Wild: A Multi-task Domain Adaptation Approach , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Andru Putra Twinanda,et al.  Vision-based approaches for surgical activity recognition using laparoscopic and RBGD videos. (Approches basées vision pour la reconnaissance d'activités chirurgicales à partir de vidéos laparoscopiques et multi-vues RGBD) , 2017 .

[34]  M. Jorge Cardoso,et al.  Uncertainty in multitask learning: joint representations for probabilistic MR-only radiotherapy planning , 2018, MICCAI.

[35]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Hao Chen,et al.  Automated Pulmonary Nodule Detection via 3D ConvNets with Online Sample Filtering and Hybrid-Loss Residual Learning , 2017, MICCAI.

[37]  Giancarlo Ferrigno,et al.  “Deep-Onto” network for surgical workflow and context recognition , 2018, International Journal of Computer Assisted Radiology and Surgery.

[38]  Gregory D. Hager,et al.  Surgical Phase Recognition: from Instrumented ORs to Hospitals Around the World , 2016 .

[39]  Junzhou Huang,et al.  Deep learning based multi-label classification for surgical tool presence detection in laparoscopic videos , 2017, 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017).

[40]  D. Louis Collins,et al.  Multi-site study of surgical practice in neurosurgery based on surgical process models , 2013, J. Biomed. Informatics.

[41]  Pierre Jannin,et al.  Automatic knowledge-based recognition of low-level tasks in ophthalmological procedures , 2012, International Journal of Computer Assisted Radiology and Surgery.

[42]  Daochang Liu,et al.  Deep Reinforcement Learning for Surgical Gesture Segmentation and Classification , 2018, MICCAI.

[43]  Bareum Choi,et al.  Surgical-tools detection based on Convolutional Neural Network in laparoscopic robot-assisted surgery. , 2017, Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference.

[44]  Mohan S. Kankanhalli,et al.  Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Anirban Mukhopadhyay,et al.  Addressing multi-label imbalance problem of surgical tool detection using CNN , 2017, International Journal of Computer Assisted Radiology and Surgery.

[46]  Gregory D. Hager,et al.  A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery , 2017, IEEE Transactions on Biomedical Engineering.

[47]  Tao Mei,et al.  Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[48]  Jason J. Corso,et al.  Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection , 2017, IEEE Transactions on Medical Imaging.

[49]  Tingting Jiang,et al.  Hard Frame Detection and Online Mapping for Surgical Phase Recognition , 2019, MICCAI.

[50]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[51]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Kevin Cleary,et al.  OR 2020: the operating room of the future. , 2004, Journal of laparoendoscopic & advanced surgical techniques. Part A.

[53]  Danail Stoyanov,et al.  DeepPhase: Surgical Phase Recognition in CATARACTS Videos , 2018, MICCAI.