Surgical gesture recognition with time delay neural network based on kinematic data

Automatic gesture recognition during surgical procedures is an enabling technology for improving advanced assistance features in surgical robotic systems (SRSs). Examples of such advanced features are user-specific feedback during execution of complex actions, prompt detection of safety-critical situations and autonomous execution of procedure sub-steps. Video data are available for all minimally invasive surgical procedures, but SRS could also provide accurate movements measurements based on kinematic data. Kinematic data provide low dimensional features for gesture recognition that would enable on-line processing during data acquisition. Therefore, we propose a Time Delay Neural Network (TDNN) applied to kinematic data for introducing temporal modelling in gesture recognition. Weevaluate accuracy and precision of the proposed method on public benchmark dataset for surgical gesture recognition (JIGSAWS). To evaluate the generalization capability of the proposed method, we acquired a new dataset introduing a different training exercise executed in virtual environment.The dataset is publicly available to enable other methods to be tested on it. The obtained results are comparable with other methods available in literature keeping also computational performance compatible with on-line processing during surgical procedure. The proposed method and the novel dataset are key-components in the development of future autonomous SRSs with advanced situation awareness capabilities.

[1]  Gregory D. Hager,et al.  Towards automatic skill evaluation: detection and segmentation of robot-assisted surgical motions. , 2006 .

[2]  Gregory D. Hager,et al.  Surgical Gesture Segmentation and Recognition , 2013, MICCAI.

[3]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[4]  Davide Zerbato,et al.  Role of virtual simulation in surgical training. , 2017, Journal of visualized surgery.

[5]  A. Menciassi,et al.  Soft Robotic Manipulator for Improving Dexterity in Minimally Invasive Surgery , 2018, Surgical innovation.

[6]  Xavier L. Aubert,et al.  Combining TDNN and HMM in a hybrid system for improved continuous-speech recognition , 1994, IEEE Trans. Speech Audio Process..

[7]  Germain Forestier,et al.  Unsupervised Trajectory Segmentation for Surgical Gesture Recognition in Robotic Training , 2016, IEEE Transactions on Biomedical Engineering.

[8]  Henry C. Lin,et al.  JHU-ISI Gesture and Skill Assessment Working Set ( JIGSAWS ) : A Surgical Activity Dataset for Human Motion Modeling , 2014 .

[9]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[10]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[11]  Gregory D. Hager,et al.  A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery , 2017, IEEE Transactions on Biomedical Engineering.

[12]  Gregory D. Hager,et al.  An Improved Model for Segmentation and Recognition of Fine-Grained Activities with Application to Surgical Training Tasks , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[13]  Lin Sun,et al.  Human Action Recognition Using Factorized Spatio-Temporal Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Blake Hannaford,et al.  Markov modeling of minimally invasive surgery based on tool/tissue interaction and force/torque signatures for evaluating surgical skills , 2001, IEEE Transactions on Biomedical Engineering.

[15]  Richard Kronland-Martinet,et al.  A real-time algorithm for signal analysis with the help of the wavelet transform , 1989 .

[16]  M. R. Marohn,et al.  Surgeons’ physical discomfort and symptoms during robotic surgery: a comprehensive ergonomic survey study , 2017, Surgical Endoscopy.

[17]  A. Darzi,et al.  Dexterity enhancement with robotic surgery , 2004, Surgical Endoscopy And Other Interventional Techniques.

[18]  Gregory D. Hager,et al.  Surgical gesture classification from video and kinematic data , 2013, Medical Image Anal..

[19]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[20]  Vijay Kumar,et al.  The grand challenges of Science Robotics , 2018, Science Robotics.

[21]  Nassir Navab,et al.  Analyzing and Exploiting NARX Recurrent Neural Networks for Long-Term Dependencies , 2017, ICLR.

[22]  Trevor Darrell,et al.  TSC-DL: Unsupervised trajectory segmentation of multi-modal surgical demonstrations with Deep Learning , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).