An Anchor-Free Convolutional Neural Network for Real-Time Surgical Tool Detection in Robot-Assisted Surgery

Robot-assisted surgery (RAS), a type of minimally invasive surgery, is used in a variety of clinical surgeries because it has a faster recovery rate and causes less pain. Automatic video analysis of RAS is an active research area, where precise surgical tool detection in real time is an important step. However, most deep learning methods currently employed for surgical tool detection are based on anchor boxes, which results in low detection speeds. In this paper, we propose an anchor-free convolutional neural network (CNN) architecture, a novel frame-by-frame method using a compact stacked hourglass network, which models the surgical tool as a single point: the center point of its bounding box. Our detector eliminates the need to design a set of anchor boxes, and is end-to-end differentiable, simpler, more accurate, and more efficient than anchor-box-based detectors. We believe our method is the first to incorporate the anchor-free idea for surgical tool detection in RAS videos. Experimental results show that our method achieves 98.5% mAP and 100% mAP at 37.0 fps on the ATLAS Dione and Endovis Challenge datasets, respectively, and truly realizes real-time surgical tool detection in RAS videos.

[1]  Russell H. Taylor,et al.  Localizing dexterous surgical tools in X-ray for image-based navigation , 2019, ArXiv.

[2]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jason J. Corso,et al.  Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection , 2017, IEEE Transactions on Medical Imaging.

[4]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Nassir Navab,et al.  Real-Time Online Adaption for Robust Instrument Tracking and Pose Estimation , 2016, MICCAI.

[6]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[7]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[8]  Sandrine Voros,et al.  Surgical tool tracking based on two CNNs: from coarse to fine , 2019 .

[9]  Danail Stoyanov,et al.  DeepPhase: Surgical Phase Recognition in CATARACTS Videos , 2018, MICCAI.

[10]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Hao Chen,et al.  Multi-Task Recurrent Convolutional Network with Correlation Loss for Surgical Video Analysis , 2019, Medical Image Anal..

[12]  Debdoot Sheet,et al.  Learning Latent Temporal Connectionism of Deep Residual Visual Abstractions for Identifying Surgical Tools in Laparoscopy Procedures , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Nassir Navab,et al.  Concurrent Segmentation and Localization for Tracking of Surgical Instruments , 2017, MICCAI.

[14]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[16]  Danail Stoyanov,et al.  Vision‐based and marker‐less surgical tool detection and tracking: a review of the literature , 2017, Medical Image Anal..

[17]  Gaurav Yengera,et al.  Less is More: Surgical Phase Recognition with Less Annotations through Self-Supervised Pre-training of CNN-LSTM Networks , 2018, ArXiv.

[18]  Blake Hannaford,et al.  Surgical Instrument Segmentation for Endoscopic Vision with Data Fusion of rediction and Kinematic Pose , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[19]  Pascal Fua,et al.  Simultaneous Recognition and Pose Estimation of Instruments in Minimally Invasive Surgery , 2017, MICCAI.

[20]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[22]  Didier Mutter,et al.  Weakly supervised convolutional LSTM approach for tool tracking in laparoscopic videos , 2018, International Journal of Computer Assisted Radiology and Surgery.

[23]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[24]  Georg Rose,et al.  Instrument State Recognition and Tracking for Effective Control of Robotized Laparoscopic Systems , 2016 .

[25]  Jaesoon Choi,et al.  Endoscopic vision based tracking of multiple surgical instruments in robot-assisted surgery , 2012, 2012 12th International Conference on Control, Automation and Systems.

[26]  Danail Stoyanov,et al.  Articulated Multi-Instrument 2-D Pose Estimation Using Fully Convolutional Networks , 2018, IEEE Transactions on Medical Imaging.

[27]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Sébastien Ourselin,et al.  Combined 2D and 3D tracking of surgical instruments for minimally invasive and robotic-assisted surgery , 2016, International Journal of Computer Assisted Radiology and Surgery.

[31]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[32]  Larry S. Davis,et al.  An Analysis of Scale Invariance in Object Detection - SNIP , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Didier Mutter,et al.  Weakly-Supervised Learning for Tool Localization in Laparoscopic Videos , 2018, CVII-STENT/LABELS@MICCAI.

[34]  Sébastien Ourselin,et al.  ToolNet: Holistically-nested real-time segmentation of robotic surgical tools , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35]  Sandrine Voros,et al.  Real-time tracking of surgical instruments based on spatio-temporal context and deep learning , 2019, Computer assisted surgery.

[36]  Gwénolé Quellec,et al.  Monitoring tool usage in surgery videos using boosted convolutional and recurrent neural networks , 2018, Medical Image Anal..

[37]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[38]  Faliang Chang,et al.  Real-time surgical instrument detection in robot-assisted surgery using a convolutional neural network cascade , 2019, Healthcare technology letters.

[39]  Chi-Wing Fu,et al.  SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network , 2018, IEEE Transactions on Medical Imaging.

[40]  Jaesoon Choi,et al.  Surgical-tools detection based on Convolutional Neural Network in laparoscopic robot-assisted surgery , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[41]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[42]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[43]  Jonathan Krause,et al.  Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[44]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[45]  Larry S. Davis,et al.  Soft-NMS — Improving Object Detection with One Line of Code , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).