TeCNO: Surgical Phase Recognition with Multi-Stage Temporal Convolutional Networks

Automatic surgical phase recognition is a challenging and crucial task with the potential to improve patient safety and become an integral part of intra-operative decision-support systems. In this paper, we propose, for the first time in workflow analysis, a Multi-Stage Temporal Convolutional Network (MS-TCN) that performs hierarchical prediction refinement for surgical phase recognition. Causal, dilated convolutions allow for a large receptive field and online inference with smooth predictions even during ambiguous transitions. Our method is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos with and without the use of additional surgical tool information. Outperforming various state-of-the-art LSTM approaches, we verify the suitability of the proposed causal MS-TCN for surgical phase recognition.

[1]  Andru Putra Twinanda,et al.  Single- and Multi-Task Architectures for Surgical Workflow Challenge at M2CAI 2016 , 2016, ArXiv.

[2]  Nassir Navab,et al.  Automatic feature generation in endoscopic images , 2008, International Journal of Computer Assisted Radiology and Surgery.

[3]  Andru Putra Twinanda,et al.  Vision-based approaches for surgical activity recognition using laparoscopic and RBGD videos. (Approches basées vision pour la reconnaissance d'activités chirurgicales à partir de vidéos laparoscopiques et multi-vues RGBD) , 2017 .

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[6]  Martin Wagner,et al.  Prediction of laparoscopic procedure duration using unlabeled, multimodal sensor data , 2018, International Journal of Computer Assisted Radiology and Surgery.

[7]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[8]  Hao Chen,et al.  Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis , 2019, Medical Image Anal..

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Gregory D. Hager,et al.  Temporal Convolutional Networks: A Unified Approach to Action Segmentation , 2016, ECCV Workshops.

[11]  Chi-Wing Fu,et al.  SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network , 2018, IEEE Transactions on Medical Imaging.

[12]  Stefanie Speidel,et al.  Video-based surgical skill assessment using 3D convolutional neural networks , 2019, International Journal of Computer Assisted Radiology and Surgery.

[13]  Nicolai Schoch,et al.  Surgical Data Science: Enabling Next-Generation Surgery , 2017, ArXiv.

[14]  Nassir Navab,et al.  Statistical modeling and recognition of surgical workflow , 2012, Medical Image Anal..

[15]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[16]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Didier Mutter,et al.  Single- and Multi-Task Architectures for Surgical Workflow Challenge at M2CAI 2016 , 2016, ArXiv.

[18]  Nicolas Martin,et al.  Assisted phase and step annotation for surgical videos , 2020, International Journal of Computer Assisted Radiology and Surgery.

[19]  Russell H. Taylor,et al.  Surgical data science for next-generation interventions , 2017, Nature Biomedical Engineering.

[20]  Danail Stoyanov,et al.  DeepPhase: Surgical Phase Recognition in CATARACTS Videos , 2018, MICCAI.

[21]  Satoshi Kondo,et al.  CATARACTS: Challenge on automatic tool annotation for cataRACT surgery , 2019, Medical Image Anal..

[22]  Gaurav Yengera,et al.  Less is More: Surgical Phase Recognition with Less Annotations through Self-Supervised Pre-training of CNN-LSTM Networks , 2018, ArXiv.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Yazan Abu Farha,et al.  MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[26]  Nicolas Padoy,et al.  Machine and deep learning for workflow recognition during surgery , 2019, Minimally invasive therapy & allied technologies : MITAT : official journal of the Society for Minimally Invasive Therapy.

[27]  Alexandre Moreau-Gaudry,et al.  Offline identification of surgical deviations in laparoscopic rectopexy , 2020, Artif. Intell. Medicine.

[28]  Didier Mutter,et al.  Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition , 2018, ArXiv.