Gesture Recognition in Robotic Surgery: A Review

Objective: Surgical activity recognition is a fundamental step in computer-assisted interventions. This paper reviews the state-of-the-art in methods for automatic recognition of fine-grained gestures in robotic surgery focusing on recent data-driven approaches and outlines the open questions and future research directions. Methods: An article search was performed on 5 bibliographic databases with the following search terms: robotic, robot-assisted, JIGSAWS, surgery, surgical, gesture, fine-grained, surgeme, action, trajectory, segmentation, recognition, parsing. Selected articles were classified based on the level of supervision required for training and divided into different groups representing major frameworks for time series analysis and data modelling. Results: A total of 52 articles were reviewed. The research field is showing rapid expansion, with the majority of articles published in the last 4 years. Deep-learning-based temporal models with discriminative feature extraction and multi-modal data integration have demonstrated promising results on small surgical datasets. Currently, unsupervised methods perform significantly less well than the supervised approaches. Conclusion: The development of large and diverse open-source datasets of annotated demonstrations is essential for development and validation of robust solutions for surgical gesture recognition. While new strategies for discriminative feature extraction and knowledge transfer, or unsupervised and semi-supervised approaches, can mitigate the need for data and labels, they have not yet been demonstrated to achieve comparable performance. Important future research directions include detection and forecast of gesture-specific errors and anomalies. Significance: This paper is a comprehensive and structured analysis of surgical gesture recognition methods aiming to summarize the status of this rapidly evolving field.

[1]  Sinisa Todorovic,et al.  Temporal Deformable Residual Networks for Action Segmentation in Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  J. Dimick,et al.  Trends in the Adoption of Robotic Surgery for Common Surgical Procedures , 2020, JAMA network open.

[3]  Sanjeev Khudanpur,et al.  Learning and inference algorithms for dynamical system models of dextrous motion , 2011 .

[4]  H. Alemzadeh,et al.  Real-Time Context-Aware Detection of Unsafe Events in Robot-Assisted Surgery , 2020, 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[5]  Elena De Momi,et al.  Weakly Supervised Recognition of Surgical Gestures , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[6]  Gregory D. Hager,et al.  Automatic Recognition of Surgical Motions Using Statistical Modeling for Capturing Variability , 2008, MMVR.

[7]  Nassir Navab,et al.  Sensor substitution for video-based action recognition , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Debdoot Sheet,et al.  Multitask Learning of Temporal Connectionism in Convolutional Networks using a Joint Distribution Loss Function to Simultaneously Identify Tools and Phase in Surgical Videos , 2019, ArXiv.

[9]  Gregory D. Hager,et al.  Surgical Gesture Segmentation and Recognition , 2013, MICCAI.

[10]  P. Jannin,et al.  Towards Generalizable Surgical Activity Recognition Using Spatial Temporal Graph Convolutional Networks , 2020, ArXiv.

[11]  Ken Goldberg,et al.  Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[12]  René Vidal,et al.  End-to-End Fine-Grained Action Segmentation and Recognition Using Conditional Random Field Models and Discriminative Sparse Coding , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Henry C. Lin,et al.  JHU-ISI Gesture and Skill Assessment Working Set ( JIGSAWS ) : A Surgical Activity Dataset for Human Motion Modeling , 2014 .

[14]  Gaurav Yengera,et al.  Less is More: Surgical Phase Recognition with Less Annotations through Self-Supervised Pre-training of CNN-LSTM Networks , 2018, ArXiv.

[15]  Gwénolé Quellec,et al.  Real-time recognition of surgical tasks in eye surgery videos , 2014, Medical Image Anal..

[16]  Pierre Jannin,et al.  Surgical process modelling: a review , 2014, International Journal of Computer Assisted Radiology and Surgery.

[17]  Germain Forestier,et al.  Unsupervised Trajectory Segmentation for Surgical Gesture Recognition in Robotic Training , 2016, IEEE Transactions on Biomedical Engineering.

[18]  Joel W. Burdick,et al.  Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Inderbir S. Gill,et al.  Use of Automated Performance Metrics to Measure Surgeon Performance during Robotic Vesicourethral Anastomosis and Methodical Development of a Training Tutorial , 2018, The Journal of urology.

[20]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[21]  Danail Stoyanov,et al.  Multi-Task Recurrent Neural Network for Surgical Gesture Recognition and Progress Prediction , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Riccardo Muradore,et al.  Cognitive Robotic Architecture for Semi-Autonomous Execution of Manipulation Tasks in a Surgical Environment , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Emma Brunskill,et al.  Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure , 2018, ICLR.

[24]  Yao Guo,et al.  Unsupervised Task Segmentation Approach for Bimanual Surgical Tasks using Spatiotemporal and Variance Properties , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Yao Guo,et al.  Transfer Learning for Surgical Task Segmentation , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[26]  Gregory D. Hager,et al.  A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery , 2017, IEEE Transactions on Biomedical Engineering.

[27]  Danail Stoyanov,et al.  More unlabelled data or label more data? A study on semi-supervised laparoscopic image segmentation , 2019, DART/MIL3ID@MICCAI.

[28]  Trevor Darrell,et al.  TSC-DL: Unsupervised trajectory segmentation of multi-modal surgical demonstrations with Deep Learning , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Juan Pablo Wachs,et al.  DESK: A Robotic Activity Dataset for Dexterous Surgical Skills Transfer to Medical Robots , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Francesco Bovo,et al.  Surgical robot simulation with BBZ console. , 2017, Journal of visualized surgery.

[31]  T. Judkins,et al.  Objective evaluation of expert and novice performance during robotic surgical training tasks , 2009, Surgical Endoscopy.

[32]  Gregory D. Hager,et al.  Unsupervised surgical data alignment with application to automatic activity annotation , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Gregory D. Hager,et al.  Task versus Subtask Surgical Skill Evaluation of Robotic Minimally Invasive Surgery , 2009, MICCAI.

[34]  René Vidal,et al.  Learning Shared , Discriminative Dictionaries for Surgical Gesture Segmentation and Classification , 2015 .

[35]  B. Hannaford,et al.  Task decomposition of laparoscopic surgery for objective evaluation of surgical residents' learning curve using hidden Markov model. , 2002, Computer aided surgery : official journal of the International Society for Computer Aided Surgery.

[36]  T. Haidegger,et al.  A DVRK-based Framework for Surgical Subtask Automation , 2019, Acta Polytechnica Hungarica.

[37]  Bogdan Gabrys,et al.  An overview of self-adaptive technologies within virtual reality training , 2016, Comput. Sci. Rev..

[38]  Daochang Liu,et al.  Deep Reinforcement Learning for Surgical Gesture Segmentation and Classification , 2018, MICCAI.

[39]  Ivan Marsic,et al.  Progress Estimation and Phase Detection for Sequential Processes , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[40]  Carlos Fernandez-Granda,et al.  Be Like Water: Robustness to Extraneous Variables Via Adaptive Feature Normalization , 2020, ArXiv.

[41]  Yijie Wang,et al.  Towards Accurate and Interpretable Surgical Skill Assessment: A Video-Based Method Incorporating Recognized Surgical Gestures and Skill Levels , 2020, MICCAI.

[42]  Riccardo Muradore,et al.  ESAD: Endoscopic Surgeon Action Detection Dataset , 2020, ArXiv.

[43]  Gregory D. Hager,et al.  Temporal Convolutional Networks: A Unified Approach to Action Segmentation , 2016, ECCV Workshops.

[44]  Gregory D. Hager,et al.  Segmental Spatiotemporal CNNs for Fine-Grained Action Segmentation , 2016, ECCV.

[45]  Hien Van Nguyen,et al.  Surgical Activities Recognition Using Multi-scale Recurrent Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[46]  Joel W. Burdick,et al.  daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[47]  Gregory D. Hager,et al.  Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks , 2019, International Journal of Computer Assisted Radiology and Surgery.

[48]  Annan Li,et al.  Atrous Temporal Convolutional Network for Video Action Segmentation , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[49]  Gregory D. Hager,et al.  Data-Derived Models for Segmentation with Application to Surgical Assessment and Training , 2009, MICCAI.

[50]  Sridhar Alla,et al.  Temporal Convolutional Networks , 2019 .

[51]  Jindong Tan,et al.  A Fast Unsupervised Approach for Multi-Modality Surgical Trajectory Segmentation , 2018, IEEE Access.

[52]  Jinglu Zhang,et al.  Symmetric Dilated Convolution for Surgical Gesture Recognition , 2020, MICCAI.

[53]  A. Darzi,et al.  Dexterity enhancement with robotic surgery , 2004, Surgical Endoscopy And Other Interventional Techniques.

[54]  Thomas Neumuth,et al.  Online time and resource management based on surgical workflow time series analysis , 2017, International Journal of Computer Assisted Radiology and Surgery.

[55]  Niall O' Mahony,et al.  Deep Learning vs. Traditional Computer Vision , 2019, CVC.

[56]  Bummo Ahn,et al.  Event-driven Surgical Gesture Segmentation and Task Recognition for Ocular Trauma Simulation , 2012, Intelligent Environments.

[57]  D. Stoyanov,et al.  Computer Vision in the Surgical Operating Room , 2020, Visceral Medicine.

[58]  Gregory D. Hager,et al.  Sparse Hidden Markov Models for Surgical Gesture Classification and Skill Evaluation , 2012, IPCAI.

[59]  René Vidal,et al.  Surgical Gesture Classification from Video Data , 2012, MICCAI.

[60]  Gregory D. Hager,et al.  Learning convolutional action primitives for fine-grained action recognition , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[61]  Effrosyni Mavroudi,et al.  Temporal Subspace Clustering for Unsupervised Action Segmentation , 2017 .

[62]  Gregory D. Hager,et al.  Zero-shot Recognition of Complex Action Sequences , 2019, ArXiv.

[63]  R. Bell,et al.  Why Johnny cannot operate. , 2009, Surgery.

[64]  Anima Anandkumar,et al.  Deep learning-based computer vision to recognize and classify suturing gestures in robot-assisted surgery , 2020, Surgery.

[65]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[66]  Gregory D. Hager,et al.  Transition state clustering: Unsupervised surgical trajectory segmentation for robot learning , 2017, ISRR.

[67]  Ion Stoica,et al.  Multi-Level Discovery of Deep Options , 2017, ArXiv.

[68]  Narges Ahmidi,et al.  Analysis of the Structure of Surgical Activity for a Suturing and Knot-Tying Task , 2016, PloS one.

[69]  Sebastian Bodenstedt,et al.  Using 3D Convolutional Neural Networks to Learn Spatiotemporal Features for Automatic Surgical Gesture Recognition in Video , 2019, MICCAI.

[70]  Henry C. Lin,et al.  Towards automatic skill evaluation: Detection and segmentation of robot-assisted surgical motions , 2006, Computer aided surgery : official journal of the International Society for Computer Aided Surgery.

[71]  Gregory D. Hager,et al.  An Improved Model for Segmentation and Recognition of Fine-Grained Activities with Application to Surgical Training Tasks , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[72]  Sanjeev Khudanpur,et al.  Learning and inference algorithms for partially observed structured switching vector autoregressive models , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[73]  Paolo Fiorini,et al.  Surgical gesture recognition with time delay neural network based on kinematic data , 2019, 2019 International Symposium on Medical Robotics (ISMR).

[74]  Didier Mutter,et al.  Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition , 2018, ArXiv.

[75]  Pheng-Ann Heng,et al.  Automatic Gesture Recognition in Robot-assisted Surgery with Reinforcement Learning and Tree Search , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[76]  Ratna Babu Chinnam,et al.  Soft Boundary Approach for Unsupervised Gesture Segmentation in Robotic-Assisted Surgery , 2017, IEEE Robotics and Automation Letters.

[77]  Nicolas Padoy,et al.  Machine and deep learning for workflow recognition during surgery , 2019, Minimally invasive therapy & allied technologies : MITAT : official journal of the Society for Minimally Invasive Therapy.

[78]  Gregory D. Hager,et al.  Recognizing Surgical Activities with Recurrent Neural Networks , 2016, MICCAI.

[79]  B. Siciliano,et al.  Physics-based task classification of da Vinci robot surgical procedures , 2018 .

[80]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Ilana Nisky,et al.  Using Augmentation to Improve the Robustness to Rotation of Deep Learning Segmentation in Robotic-Assisted Surgical Data , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[82]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[83]  Chenliang Xu,et al.  TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation , 2017, ArXiv.

[84]  Heiko Neumann,et al.  Local Temporal Bilinear Pooling for Fine-Grained Action Parsing , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[85]  O. Dergachyova Knowledge-based support for surgical workflow analysis and recognition , 2017 .

[86]  Blake Hannaford,et al.  Markov modeling of minimally invasive surgery based on tool/tissue interaction and force/torque signatures for evaluating surgical skills , 2001, IEEE Transactions on Biomedical Engineering.

[87]  Pierre Jannin,et al.  Surgical Gesture Recognition with Optical Flow only , 2019, ArXiv.

[88]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[89]  Gregory D. Hager,et al.  Automated Surgical Activity Recognition with One Labeled Sequence , 2019, MICCAI.