Surgical Tools Detection Based on Training Sample Adaptation in Laparoscopic Videos

The performance of object detection methods plays an important role in the recognition of surgical tools, and is a key link in the automated evaluation of surgical skills. In this paper, we propose a novel framework for one-stage object detection based on a sample adaptive process controlled by reinforcement learning, which can maintain the speed advantage while maintaining higher accuracy than two-stage object detection methods. We use m2cai16-tool-locations and AJU-Set, two datasets covering seven surgical tools with spatial information collected from hospital gallbladder surgery videos to evaluate and verify the effectiveness of our proposed framework. The experiments show that our proposed framework can make the one-stage object detection method achieve 70.1% and 77.3% accuracy on m2cai16-tool-locations and AJU-Set, respectively. We further validated the effectiveness of our proposed framework by analyzing the usage patterns, motion trajectories, and mobile values of surgical tools.

[1]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Nassir Navab,et al.  On-line Recognition of Surgical Activity for Monitoring in the Operating Room , 2008, AAAI.

[5]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[6]  Andru Putra Twinanda,et al.  Single- and Multi-Task Architectures for Tool Presence Detection Challenge at M2CAI 2016 , 2016, ArXiv.

[7]  Jin Young Choi,et al.  Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Larry S. Davis,et al.  An Analysis of Scale Invariance in Object Detection - SNIP , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Henry C. Lin,et al.  Towards automatic skill evaluation: Detection and segmentation of robot-assisted surgical motions , 2006, Computer aided surgery : official journal of the International Society for Computer Aided Surgery.

[11]  Zhiqiang Shen,et al.  DSOD: Learning Deeply Supervised Object Detectors from Scratch , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Bingbing Ni,et al.  Scale-Transferrable Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Gregory D. Hager,et al.  Surgical Gesture Segmentation and Recognition , 2013, MICCAI.

[14]  Fahad Shahbaz Khan,et al.  Enriched Feature Guided Refinement Network for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jinjun Xiong,et al.  Revisiting RCNN: On Awakening the Classification Power of Faster RCNN , 2018, ECCV.

[17]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Yuning Jiang,et al.  FoveaBox: Beyound Anchor-Based Object Detection , 2019, IEEE Transactions on Image Processing.

[19]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Zhe Chen,et al.  Context Refinement for Object Detection , 2018, ECCV.

[21]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[22]  Peng Chen,et al.  Surgical Tools Detection Based on Modulated Anchoring Network in Laparoscopic Videos , 2020, IEEE Access.

[23]  Gwénolé Quellec,et al.  Real-time analysis of cataract surgery videos using statistical models , 2017, Multimedia Tools and Applications.

[24]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  T. Osler,et al.  Complications in surgical patients. , 2002, Archives of surgery.

[26]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  David Budden,et al.  Distributed Prioritized Experience Replay , 2018, ICLR.

[28]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Marios Savvides,et al.  Feature Selective Anchor-Free Module for Single-Shot Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[32]  Kai Chen,et al.  Region Proposal by Guided Anchoring , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[34]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[35]  Pierre Jannin,et al.  A Framework for the Recognition of High-Level Surgical Tasks From Video Images for Cataract Surgeries , 2012, IEEE Transactions on Biomedical Engineering.

[36]  Jonathan Krause,et al.  Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[37]  Sandrine Voros,et al.  2D/3D Real-Time Tracking of Surgical Instruments Based on Endoscopic Image Processing , 2015, CARE@MICCAI.

[38]  T. Weiser,et al.  Global access to surgical care: a modelling study. , 2015, The Lancet. Global health.

[39]  Klaus Schöffmann,et al.  Temporal segmentation of laparoscopic videos into surgical phases , 2016, 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI).

[40]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[41]  Anirban Mukhopadhyay,et al.  Tool and Phase recognition using contextual CNN features , 2016, ArXiv.

[42]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[43]  Svetlana Lazebnik,et al.  Active Object Localization with Deep Reinforcement Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Shifeng Zhang,et al.  Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[46]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.