The surgical workflow challenge at M2CAI 2016 consists of identifying 8 surgical phases in cholecystectomy procedures. In Fig. 1, we show the defined phases as well as the phase transitions observed in the m2cai2016-workflow dataset [3,6]. The training dataset, released on May 23, 2016, consists of 27 cholecystectomy videos annotated with the phases at 25 fps; while the testing dataset, released on September 9, 2016, consists of 14 videos. Here, we propose to use deep architectures to perform the phase recognition task. This work is based on our previous work [6] where we presented several network architectures to perform multiple recognition tasks on laparoscopic videos. The tasks are surgical phase recognition and tool presence detection. Ultimately, we proposed an architecture which is designed to jointly perform both tasks. In this work, we are using both single-task and multi-task networks to learn the discriminative visual features from the dataset. Naturally, surgical procedures are performed accordingly to a pre-defined surgical workflow. Thus, to properly perform surgical phase recognition, it is important to enforce the temporal constraints coming from the surgical workflow. On the other hand, the networks only accepts images in a frame-wise manner, thus there is not any temporal information incorporated in the results given by the networks. Therefore, an additional pipeline is required to enforce these temporal constraints. In [6], we enforce the surgical workflow constraint by using an approach based on Hidden Markov model (HMM). However, HMMs work under the Markov assumption where the current state only depends on the previous state. In addition, the number states passed along a sequence is typically limited to the number of classes defined in the problem. These limitations are however not present in long-short term memory (LSTM) network. In this work, we are also going to perform the surgical phase recognition task using a LSTM network and compare the recognition results to the ones obtained by the HMM pipeline.
[1]
Andru Putra Twinanda,et al.
EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos
,
2016,
IEEE Transactions on Medical Imaging.
[2]
Trevor Darrell,et al.
Caffe: Convolutional Architecture for Fast Feature Embedding
,
2014,
ACM Multimedia.
[3]
Michael Kranzfelder,et al.
A centralized data acquisition framework for operating theatres
,
2015,
2015 17th International Conference on E-health Networking, Application & Services (HealthCom).
[4]
Nassir Navab,et al.
Workflow monitoring based on 3D motion features
,
2009,
2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.