Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery

Automatic surgical gesture recognition is fundamentally important to enable intelligent cognitive assistance in robotic surgery. With recent advancement in robot-assisted minimally invasive surgery, rich information including surgical videos and robotic kinematics can be recorded, which provide complementary knowledge for understanding surgical gestures. However, existing methods either solely adopt uni-modal data or directly concatenate multi-modal representations, which can not sufficiently exploit the informative correlations inherent in visual and kinematics data to boost gesture recognition accuracies. In this regard, we propose a novel online approach of multi-modal relational graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information through interactive message propagation in the latent feature space. In specific, we first extract embeddings from video and kinematics sequences with temporal convolutional networks and LSTM units. Next, we identify multi-relations in these multi-modal embeddings and leverage them through a hierarchical relational graph learning module. The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset, outperforming current uni-modal and multi-modal methods on both suturing and knot typing tasks. Furthermore, we validated our method on in-house visual-kinematics datasets collected with da Vinci Research Kit (dVRK) platforms in two centers, with consistent promising performance achieved. Our code and data are released at: https://www.cse.cuhk.edu.hk/~yhlong/mrgnet.html.

[1]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[2]  Gregory D. Hager,et al.  Temporal Convolutional Networks: A Unified Approach to Action Segmentation , 2016, ECCV Workshops.

[3]  Gregory D. Hager,et al.  Segmental Spatiotemporal CNNs for Fine-Grained Action Segmentation , 2016, ECCV.

[4]  Roger Y. Tsai,et al.  A new technique for fully autonomous and efficient 3D robotics hand/eye calibration , 1988, IEEE Trans. Robotics Autom..

[5]  Jindong Tan,et al.  A Fast Unsupervised Approach for Multi-Modality Surgical Trajectory Segmentation , 2018, IEEE Access.

[6]  Jinglu Zhang,et al.  Symmetric Dilated Convolution for Surgical Gesture Recognition , 2020, MICCAI.

[7]  Hui Cheng,et al.  Deep Reasoning with Knowledge Graph for Social Relationship Understanding , 2018, IJCAI.

[8]  Ana Luisa Trejos,et al.  Analysis of Energy-Based Metrics for Laparoscopic Skills Assessment , 2018, IEEE Transactions on Biomedical Engineering.

[9]  Chenliang Xu,et al.  TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation , 2017, ArXiv.

[10]  P. Jannin,et al.  Towards Generalizable Surgical Activity Recognition Using Spatial Temporal Graph Convolutional Networks , 2020, ArXiv.

[11]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[12]  Timothy Patten,et al.  Addressing the Sim2Real Gap in Robotic 3-D Object Classification , 2019, IEEE Robotics and Automation Letters.

[13]  Timothy Patten,et al.  Robust 3D Object Classification by Combining Point Pair Features and Graph Convolution , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[14]  Gregory D. Hager,et al.  A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery , 2017, IEEE Transactions on Biomedical Engineering.

[15]  Gregory D. Hager,et al.  Sparse Hidden Markov Models for Surgical Gesture Classification and Skill Evaluation , 2012, IPCAI.

[16]  Junzhou Huang,et al.  Graph Convolutional Nets for Tool Presence Detection in Surgical Videos , 2019, IPMI.

[17]  Gregory D. Hager,et al.  Data-Derived Models for Segmentation with Application to Surgical Assessment and Training , 2009, MICCAI.

[18]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[19]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[20]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[21]  K. M. Deliparaschos,et al.  Evolution of autonomous and semi‐autonomous robotic surgical systems: a review of the literature , 2011, The international journal of medical robotics + computer assisted surgery : MRCAS.

[22]  Hien Van Nguyen,et al.  Surgical Activities Recognition Using Multi-scale Recurrent Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Gregory D. Hager,et al.  Learning convolutional action primitives for fine-grained action recognition , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Joel W. Burdick,et al.  daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[27]  Diego Marcheggiani,et al.  Exploiting Semantics in Neural Machine Translation with Graph Convolutional Networks , 2018, NAACL.

[28]  Chi-Wing Fu,et al.  SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network , 2018, IEEE Transactions on Medical Imaging.

[29]  Sebastian Bodenstedt,et al.  Using 3D Convolutional Neural Networks to Learn Spatiotemporal Features for Automatic Surgical Gesture Recognition in Video , 2019, MICCAI.

[30]  Joel W. Burdick,et al.  Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Gregory D. Hager,et al.  Surgical gesture classification from video and kinematic data , 2013, Medical Image Anal..

[32]  Gregory D. Hager,et al.  Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks , 2019, International Journal of Computer Assisted Radiology and Surgery.

[33]  Xavier Bresson,et al.  Benchmarking Graph Neural Networks , 2020, ArXiv.

[34]  Alex Smola,et al.  Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.

[35]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Danail Stoyanov,et al.  Multi-Task Recurrent Neural Network for Surgical Gesture Recognition and Progress Prediction , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[38]  T. Haidegger,et al.  A DVRK-based Framework for Surgical Subtask Automation , 2019, Acta Polytechnica Hungarica.

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  Gregory D. Hager,et al.  Recognizing Surgical Activities with Recurrent Neural Networks , 2016, MICCAI.

[41]  Abhinav Gupta,et al.  Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Henry C. Lin,et al.  JHU-ISI Gesture and Skill Assessment Working Set ( JIGSAWS ) : A Surgical Activity Dataset for Human Motion Modeling , 2014 .

[43]  Daniel J Scott,et al.  Design of a Proficiency-Based Skills Training Curriculum for the Fundamentals of Laparoscopic Surgery , 2007, Surgical innovation.

[44]  Pheng-Ann Heng,et al.  Automatic Gesture Recognition in Robot-assisted Surgery with Reinforcement Learning and Tree Search , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[45]  Trevor Darrell,et al.  TSC-DL: Unsupervised trajectory segmentation of multi-modal surgical demonstrations with Deep Learning , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[46]  Barbara L. Bass,et al.  “Chopstick” surgery: a novel technique improves surgeon performance and eliminates arm collision in robotic single-incision laparoscopic surgery , 2009, Surgical Endoscopy.

[47]  John Kenneth Salisbury,et al.  The Intuitive/sup TM/ telesurgery system: overview and application , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).