Single- and Multi-Task Architectures for Surgical Workflow Challenge at M2CAI 2016

The surgical workflow challenge at M2CAI 2016 consists of identifying 8 surgical phases in cholecystectomy procedures. Here, we propose to use deep architectures that are based on our previous work where we presented several architectures to perform multiple recognition tasks on laparoscopic videos. In this technical report, we present the phase recognition results using two architectures: (1) a single-task architecture designed to perform solely the surgical phase recognition task and (2) a multi-task architecture designed to perform jointly phase recognition and tool presence detection. On top of these architectures we propose to use two different approaches to enforce the temporal constraints of the surgical workflow: (1) HMM-based and (2) LSTM-based pipelines. The results show that the LSTM-based approach is able to outperform the HMM-based approach and also to properly enforce the temporal constraints into the recognition process.

[1]  Nassir Navab,et al.  Workflow monitoring based on 3D motion features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[4]  Michael Kranzfelder,et al.  A centralized data acquisition framework for operating theatres , 2015, 2015 17th International Conference on E-health Networking, Application & Services (HealthCom).

[5]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[6]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.