Deep Learning Based Robotic Tool Detection and Articulation Estimation With Spatio-Temporal Layers

Surgical-tool joint detection from laparoscopic images is an important but challenging task in computer-assisted minimally invasive surgery. Illumination levels, variations in background and the different number of tools in the field of view, all pose difficulties to algorithm and model training. Yet, such challenges could be potentially tackled by exploiting the temporal information in laparoscopic videos to avoid per frame handling of the problem. In this letter, we propose a novel encoder–decoder architecture for surgical instrument joint detection and localization that uses three-dimensional convolutional layers to exploit spatio-temporal features from laparoscopic videos. When tested on benchmark and custom-built datasets, a median Dice similarity coefficient of 85.1% with an interquartile range of 4.6% highlights performance better than the state of the art based on single-frame processing. Alongside novelty of the network architecture, the idea for inclusion of temporal information appears to be particularly useful when processing images with unseen backgrounds during the training phase, which indicates that spatio-temporal features for joint detection help to generalize the solution.

[1]  Michael D. Naish,et al.  On constrained manipulation in robotics-assisted minimally invasive surgery , 2010, 2010 3rd IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics.

[2]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  V. Ozben,et al.  Robotic-Assisted Minimally Invasive Surgery , 2019, Springer International Publishing.

[4]  Alexandre Krupa,et al.  Pose Estimation and Feature Tracking for Robot Assisted Surgery with Medical Imaging , 2007 .

[5]  Pascal Fua,et al.  Simultaneous Recognition and Pose Estimation of Instruments in Minimally Invasive Surgery , 2017, MICCAI.

[6]  David M. Blei,et al.  Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..

[7]  Lena Maier-Hein,et al.  Uncertainty-Aware Organ Classification for Surgical Data Science Applications in Laparoscopy , 2017, IEEE Transactions on Biomedical Engineering.

[8]  Mubarak Shah,et al.  An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos , 2017, ArXiv.

[9]  D. Stoyanov,et al.  3-D Pose Estimation of Articulated Instruments in Robotic Minimally Invasive Surgery , 2018, IEEE Transactions on Medical Imaging.

[10]  Jason J. Corso,et al.  Product of tracking experts for visual tracking of surgical tools , 2013, 2013 IEEE International Conference on Automation Science and Engineering (CASE).

[11]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[12]  Sandrine Voros,et al.  Real-time tracking of surgical instruments based on spatio-temporal context and deep learning , 2019, Computer assisted surgery.

[13]  Danail Stoyanov,et al.  Articulated Multi-Instrument 2-D Pose Estimation Using Fully Convolutional Networks , 2018, IEEE Transactions on Medical Imaging.

[14]  Yu-Bin Yang,et al.  Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections , 2016, NIPS.

[15]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[16]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[17]  Alexander Rakhlin,et al.  Automatic Instrument Segmentation in Robot-Assisted Surgery Using Deep Learning , 2018, bioRxiv.

[18]  Luc Soler,et al.  Autonomous 3-D positioning of surgical instruments in robotized laparoscopic surgery using visual servoing , 2003, IEEE Trans. Robotics Autom..

[19]  Jason J. Corso,et al.  Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection , 2017, IEEE Transactions on Medical Imaging.

[20]  Lena Maier-Hein,et al.  2017 Robotic Instrument Segmentation Challenge , 2019, ArXiv.

[21]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[22]  Cameron N. Riviere,et al.  Toward Improving Safety in Neurosurgery with an Active Handheld Instrument , 2018, Annals of Biomedical Engineering.

[23]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Sébastien Ourselin,et al.  Toward Detection and Localization of Instruments in Minimally Invasive Surgery , 2013, IEEE Transactions on Biomedical Engineering.

[25]  Pascal Fua,et al.  Fast Part-Based Classification for Instrument Detection in Minimally Invasive Surgery , 2014, MICCAI.

[26]  Stefanie Speidel,et al.  Video-based surgical skill assessment using 3D convolutional neural networks , 2019, International Journal of Computer Assisted Radiology and Surgery.

[27]  Sebastian Scherer,et al.  VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[28]  Carlo Luschi,et al.  Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.

[29]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Gerd Hirzinger,et al.  Motion Tracking for Minimally Invasive Robotic Surgery , 2008 .

[31]  Sébastien Ourselin,et al.  Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery , 2018, ArXiv.

[32]  Nassir Navab,et al.  Concurrent Segmentation and Localization for Tracking of Surgical Instruments , 2017, MICCAI.