Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources

Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The objective of this work is to estimate the current state of the surgical task based on the actions performed or events occurred as the task progresses. We propose Fusion-KVE, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events. Additionally, we examine the strengths and weaknesses of different state estimation models in segmenting states with different representative features or levels of granularity. We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging, created using the da Vinci® Xi surgical system. Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models in both JIGSAWS suturing dataset and our RIOUS dataset.

[1]  Peter Kazanzides,et al.  A Computational Framework for Complementary Situational Awareness (CSA) in Surgical Assistant Robots , 2018, 2018 Second IEEE International Conference on Robotic Computing (IRC).

[2]  Gregory D. Hager,et al.  Sparse Hidden Markov Models for Surgical Gesture Classification and Skill Evaluation , 2012, IPCAI.

[3]  Elena De Momi,et al.  Weakly Supervised Recognition of Surgical Gestures , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[4]  Blake Hannaford,et al.  Generalized approach for modeling minimally invasive surgery as a stochastic process using a discrete Markov model , 2006, IEEE Transactions on Biomedical Engineering.

[5]  Gregory D. Hager,et al.  Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning , 2017, ISRR.

[6]  Gregory D. Hager,et al.  Temporal Convolutional Networks: A Unified Approach to Action Segmentation , 2016, ECCV Workshops.

[7]  Gregory D. Hager,et al.  Learning convolutional action primitives for fine-grained action recognition , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Gregory D. Hager,et al.  Segmental Spatiotemporal CNNs for Fine-Grained Action Segmentation , 2016, ECCV.

[9]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[10]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  K. M. Deliparaschos,et al.  Evolution of autonomous and semi‐autonomous robotic surgical systems: a review of the literature , 2011, The international journal of medical robotics + computer assisted surgery : MRCAS.

[12]  Gregory D. Hager,et al.  Surgical gesture classification from video and kinematic data , 2013, Medical Image Anal..

[13]  Gregory D. Hager,et al.  Recognizing Surgical Activities with Recurrent Neural Networks , 2016, MICCAI.

[14]  René Vidal,et al.  End-to-End Fine-Grained Action Segmentation and Recognition Using Conditional Random Field Models and Discriminative Sparse Coding , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[16]  Gregory D. Hager,et al.  Surgical Gesture Segmentation and Recognition , 2013, MICCAI.

[17]  Gaurav Yengera,et al.  Less is More: Surgical Phase Recognition with Less Annotations through Self-Supervised Pre-training of CNN-LSTM Networks , 2018, ArXiv.

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[20]  Guy Rosman,et al.  Machine learning and coresets for automated real-time video segmentation of laparoscopic and robot-assisted surgery , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[22]  Paolo Fiorini,et al.  Surgical gesture recognition with time delay neural network based on kinematic data , 2019, 2019 International Symposium on Medical Robotics (ISMR).

[23]  Henry C. Lin,et al.  JHU-ISI Gesture and Skill Assessment Working Set ( JIGSAWS ) : A Surgical Activity Dataset for Human Motion Modeling , 2014 .

[24]  Jürgen Schmidhuber,et al.  Recurrent nets that time and count , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[25]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Chi-Wing Fu,et al.  SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network , 2018, IEEE Transactions on Medical Imaging.

[28]  Gregory D. Hager,et al.  Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks , 2019, International Journal of Computer Assisted Radiology and Surgery.

[29]  Irfan A. Essa,et al.  Surgical Activity Recognition in Robot-Assisted Radical Prostatectomy using Deep Learning , 2018, MICCAI.

[30]  Gregory D. Hager,et al.  A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery , 2017, IEEE Transactions on Biomedical Engineering.

[31]  Didier Mutter,et al.  Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition , 2018, ArXiv.

[32]  Chenliang Xu,et al.  TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation , 2017, ArXiv.

[33]  Chi Zhang,et al.  Temporal clustering of surgical activities in robot-assisted surgery , 2017, International Journal of Computer Assisted Radiology and Surgery.