Aggregating Long-Term Context for Learning Surgical Workflows

Analyzing surgical workflow is crucial for computers to understand surgeries. Deep learning techniques have recently been widely applied to recognize surgical workflows. Many of the existing temporal neural network models are limited in their capability to handle long-term dependencies in the data, instead of relying upon strong performance of the underlying per-frame visual models. We propose a new temporal network structure that leverages task-specific network representation to collect long-term sufficient statistics that are propagated by a sufficient statistics model (SSM). We leverage our approach within an LSTM back-bone for the task of surgical phase recognition and explore several choices for propagated statistics. We demonstrate superior results over existing state-of-the-art segmentation and novel segmentation techniques, on two laparoscopic cholecystectomy datasets: the already published Cholec80dataset and MGH100, a novel dataset with more challenging, yet clinically meaningful, segment labels.

[1]  Andru Putra Twinanda,et al.  RSDNet: Learning to Predict Remaining Surgery Duration from Laparoscopic Videos Without Manual Annotations , 2018, IEEE Transactions on Medical Imaging.

[2]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[3]  Joel W. Burdick,et al.  Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[4]  S. Mallat A wavelet tour of signal processing , 1998 .

[5]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[6]  Takahiro Yamanashi,et al.  Real-time automatic surgical phase recognition in laparoscopic sigmoidectomy using the convolutional neural network-based deep learning approach , 2019, Surgical Endoscopy.

[7]  Nassir Navab,et al.  Statistical modeling and recognition of surgical workflow , 2012, Medical Image Anal..

[8]  Guy Rosman,et al.  Machine learning and coresets for automated real-time video segmentation of laparoscopic and robot-assisted surgery , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Hao Chen,et al.  Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis , 2019, Medical Image Anal..

[10]  Andru Putra Twinanda,et al.  Deep Neural Networks Predict Remaining Surgery Duration from Cholecystectomy Videos , 2017, MICCAI.

[11]  Danail Stoyanov,et al.  DeepPhase: Surgical Phase Recognition in CATARACTS Videos , 2018, MICCAI.

[12]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[13]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[14]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  G. Rosman,et al.  Computer Vision Analysis of Intraoperative Video: Automated Recognition of Operative Steps in Laparoscopic Sleeve Gastrectomy. , 2019, Annals of surgery.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Chi-Wing Fu,et al.  SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network , 2018, IEEE Transactions on Medical Imaging.

[19]  Franklin C. Crow,et al.  Summed-area tables for texture mapping , 1984, SIGGRAPH.

[20]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[21]  J. P. Lewis,et al.  Fast Template Matching , 2009 .

[22]  D. Hashimoto,et al.  Surgical procedural map scoring for decision-making in laparoscopic cholecystectomy. , 2019, American journal of surgery.