Real-time surgical instrument detection in robot-assisted surgery using a convolutional neural network cascade

Surgical instrument detection in robot-assisted surgery videos is an import vision component for these systems. Most of the current deep learning methods focus on single-tool detection and suffer from low detection speed. To address this, the authors propose a novel frame-by-frame detection method using a cascading convolutional neural network (CNN) which consists of two different CNNs for real-time multi-tool detection. An hourglass network and a modified visual geometry group (VGG) network are applied to jointly predict the localisation. The former CNN outputs detection heatmaps representing the location of tool tip areas, and the latter performs bounding-box regression for tool tip areas on these heatmaps stacked with input RGB image frames. The authors’ method is tested on the publicly available EndoVis Challenge dataset and the ATLAS Dione dataset. The experimental results show that their method achieves better performance than mainstream detection methods in terms of detection accuracy and speed.

[1]  Sven Haase,et al.  Laparoscopic instrument localization using a 3-D Time-of-Flight/RGB endoscope , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[2]  Gwénolé Quellec,et al.  Monitoring tool usage in surgery videos using boosted convolutional and recurrent neural networks , 2018, Medical Image Anal..

[3]  Klaus Radermacher,et al.  Assessment of optical localizer accuracy for computer aided surgery systems , 2010, Computer aided surgery : official journal of the International Society for Computer Aided Surgery.

[4]  Debdoot Sheet,et al.  Tracking of Retinal Microsurgery Tools Using Late Fusion of Responses from Convolutional Neural Network over Pyramidally Decomposed Frames , 2016, ICVGIP Workshops.

[5]  Nassir Navab,et al.  Concurrent Segmentation and Localization for Tracking of Surgical Instruments , 2017, MICCAI.

[6]  Debdoot Sheet,et al.  Learning Latent Temporal Connectionism of Deep Residual Visual Abstractions for Identifying Surgical Tools in Laparoscopy Procedures , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7]  Austin Reiter,et al.  Feature Classification for Tracking Articulated Surgical Tools , 2012, MICCAI.

[8]  Jonathan Krause,et al.  Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[10]  Didier Mutter,et al.  Weakly supervised convolutional LSTM approach for tool tracking in laparoscopic videos , 2018, International Journal of Computer Assisted Radiology and Surgery.

[11]  Danail Stoyanov,et al.  Vision‐based and marker‐less surgical tool detection and tracking: a review of the literature , 2017, Medical Image Anal..

[12]  Jason J. Corso,et al.  Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection , 2017, IEEE Transactions on Medical Imaging.

[13]  P. Allen,et al.  Articulated Surgical Tool Detection Using Virtually-Rendered Templates , 2012 .

[14]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[15]  Su-Lin Lee,et al.  Intravascular Imaging and Computer Assisted Stenting, and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis , 2017, Lecture Notes in Computer Science.

[16]  Danail Stoyanov,et al.  Articulated Multi-Instrument 2-D Pose Estimation Using Fully Convolutional Networks , 2018, IEEE Transactions on Medical Imaging.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Zijian Zhao,et al.  Surgical instruments tracking based on deep learning with lines detection and spatio-temporal context , 2017, 2017 Chinese Automation Congress (CAC).

[19]  Sandrine Voros,et al.  2D/3D Real-Time Tracking of Surgical Instruments Based on Endoscopic Image Processing , 2015, CARE@MICCAI.

[20]  Sébastien Ourselin,et al.  Toward Detection and Localization of Instruments in Minimally Invasive Surgery , 2013, IEEE Transactions on Biomedical Engineering.

[21]  Nassir Navab,et al.  Surgical tool detection and tracking in retinal microsurgery , 2015, Medical Imaging.

[22]  Danail Stoyanov,et al.  DeepPhase: Surgical Phase Recognition in CATARACTS Videos , 2018, MICCAI.

[23]  Pascal Fua,et al.  Simultaneous Recognition and Pose Estimation of Instruments in Minimally Invasive Surgery , 2017, MICCAI.

[24]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[26]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  David Zhang,et al.  Fast Visual Tracking via Dense Spatio-temporal Context Learning , 2014, ECCV.