Aggregating Long-Term Context for Learning Laparoscopic and Robot-Assisted Surgical Workflows

Analyzing surgical workflow is crucial for surgical assistance robots to understand surgeries. With the understanding of the complete surgical workflow, the robots are able to assist the surgeons in intra-operative events, such as by giving a warning when the surgeon is entering specific keys or high-risk phases. Deep learning techniques have recently been widely applied to recognizing surgical workflows. Many of the existing temporal neural network models are limited in their capability to handle long-term dependencies in the data, instead, relying upon the strong performance of the underlying per-frame visual models. We propose a new temporal network structure that leverages task-specific network representation to collect long-term sufficient statistics that are propagated by a sufficient statistics model (SSM). We implement our approach within an LSTM backbone for the task of surgical phase recognition and explore several choices for propagated statistics. We demonstrate superior results over existing and novel state-of-the-art segmentation techniques on two laparoscopic cholecystectomy datasets: the publicly available Cholec80 dataset and MGH100, a novel dataset with more challenging and clinically meaningful segment labels.

[1]  Gregory D. Hager,et al.  Temporal Convolutional Networks: A Unified Approach to Action Segmentation , 2016, ECCV Workshops.

[2]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[3]  Pheng-Ann Heng,et al.  Learning Motion Flows for Semi-supervised Instrument Segmentation from Robotic Surgical Video , 2020, MICCAI.

[4]  Pheng-Ann Heng,et al.  LRTD: long-range temporal dependency based active learning for surgical workflow recognition , 2020, International Journal of Computer Assisted Radiology and Surgery.

[5]  S. Mallat A wavelet tour of signal processing , 1998 .

[6]  Andru Putra Twinanda,et al.  Deep Neural Networks Predict Remaining Surgery Duration from Cholecystectomy Videos , 2017, MICCAI.

[7]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[8]  Nassir Navab,et al.  Statistical modeling and recognition of surgical workflow , 2012, Medical Image Anal..

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Faliang Chang,et al.  Real-time surgical instrument detection in robot-assisted surgery using a convolutional neural network cascade , 2019, Healthcare technology letters.

[11]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[12]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[13]  Sébastien Ourselin,et al.  ToolNet: Holistically-nested real-time segmentation of robotic surgical tools , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Nassir Navab,et al.  TeCNO: Surgical Phase Recognition with Multi-Stage Temporal Convolutional Networks , 2020, MICCAI.

[15]  D. Hashimoto,et al.  Surgical procedural map scoring for decision-making in laparoscopic cholecystectomy. , 2019, American journal of surgery.

[16]  Yao Guo,et al.  Transfer Learning for Surgical Task Segmentation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[17]  Hao Chen,et al.  Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis , 2019, Medical Image Anal..

[18]  Daochang Liu,et al.  Deep Reinforcement Learning for Surgical Gesture Segmentation and Classification , 2018, MICCAI.

[19]  Guy Rosman,et al.  Machine learning and coresets for automated real-time video segmentation of laparoscopic and robot-assisted surgery , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Joel W. Burdick,et al.  daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Danail Stoyanov,et al.  DeepPhase: Surgical Phase Recognition in CATARACTS Videos , 2018, MICCAI.

[23]  Takahiro Yamanashi,et al.  Real-time automatic surgical phase recognition in laparoscopic sigmoidectomy using the convolutional neural network-based deep learning approach , 2019, Surgical Endoscopy.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[26]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[27]  Gaurav Yengera,et al.  Future-State Predicting LSTM for Early Surgery Type Recognition , 2018, IEEE Transactions on Medical Imaging.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Andru Putra Twinanda,et al.  RSDNet: Learning to Predict Remaining Surgery Duration from Laparoscopic Videos Without Manual Annotations , 2018, IEEE Transactions on Medical Imaging.

[30]  G. Rosman,et al.  Computer Vision Analysis of Intraoperative Video: Automated Recognition of Operative Steps in Laparoscopic Sleeve Gastrectomy. , 2019, Annals of surgery.

[31]  Thomas M. Ward,et al.  Automated operative phase identification in peroral endoscopic myotomy , 2020, Surgical Endoscopy.

[32]  Chi-Wing Fu,et al.  SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network , 2018, IEEE Transactions on Medical Imaging.

[33]  Farida Cheriet,et al.  Detection and correction of specular reflections for automatic surgical tool segmentation in thoracoscopic images , 2007, Machine Vision and Applications.

[34]  Gert Kootstra,et al.  International Conference on Robotics and Automation (ICRA) , 2008, ICRA 2008.

[35]  Pheng-Ann Heng,et al.  Automatic Gesture Recognition in Robot-assisted Surgery with Reinforcement Learning and Tree Search , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Joel W. Burdick,et al.  Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Pierre Jannin,et al.  Surgical Phases Detection from Microscope Videos by Combining SVM and HMM , 2010, MCV.

[38]  Pheng-Ann Heng,et al.  Incorporating Temporal Prior from Motion Flow for Instrument Segmentation in Minimally Invasive Surgery Video , 2019, MICCAI.