Graph Convolutional Nets for Tool Presence Detection in Surgical Videos

Surgical tool presence detection is one of the key problems in automatic surgical video content analysis. Solving this problem benefits many applications such as the evaluation of surgical instrument usage and automatic surgical report generation. Given the fact that each video is only sparsely labeled at the frame level, meaning that only a small portion of video frames will be properly labeled, existing approaches only model this problem as an image (frame) classification problem without considering temporal information in surgical videos. In this paper, we propose a deep neural network model utilizing both spatial and temporal information from surgical videos for surgical tool presence detection. The proposed model uses Graph Convolutional Networks (GCNs) along the temporal dimension to learn better features by considering the relationship between continuous video frames. To the best of our knowledge, this is the first work taking videos as input to solve the surgical tool presence detection problem. Our experiments demonstrate the employment of temporal information offers a significant improvement to this problem, and the proposed approach achieves better performance than all state-of-the-art methods.

[1]  Kevin Cleary,et al.  OR2020 workshop overview: operating room of the future , 2004, CARS.

[2]  Jonathan Krause,et al.  Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[3]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Cordelia Schmid,et al.  AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[6]  Pascal Fua,et al.  Fast Part-Based Classification for Instrument Detection in Minimally Invasive Surgery , 2014, MICCAI.

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Hao Chen,et al.  AGNet: Attention-Guided Network for Surgical Tool Presence Detection , 2017, DLMIA/ML-CDS@MICCAI.

[9]  Constantinos Loukas,et al.  Video content analysis of surgical procedures , 2018, Surgical Endoscopy.

[10]  Junzhou Huang,et al.  Deep learning based multi-label classification for surgical tool presence detection in laparoscopic videos , 2017, 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017).

[11]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[12]  Chi-Wing Fu,et al.  SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network , 2018, IEEE Transactions on Medical Imaging.

[13]  Jaesoon Choi,et al.  Surgical-tools detection based on Convolutional Neural Network in laparoscopic robot-assisted surgery , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[14]  Junzhou Huang,et al.  Subtype Cell Detection with an Accelerated Deep Convolution Neural Network , 2016, MICCAI.

[15]  Abhinav Gupta,et al.  Videos as Space-Time Region Graphs , 2018, ECCV.

[16]  Ruoyu Li,et al.  Adaptive Graph Convolutional Neural Networks , 2018, AAAI.

[17]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Junzhou Huang,et al.  Seq2seq Fingerprint: An Unsupervised Deep Molecular Embedding for Drug Discovery , 2017, BCB.

[19]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Matthieu Cord,et al.  WILDCAT: Weakly Supervised Learning of Deep ConvNets for Image Classification, Pointwise Localization and Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xiaoyu Zhang,et al.  Seq3seq Fingerprint: Towards End-to-end Semi-supervised Deep Drug Discovery , 2018, SIGB.

[22]  Gaurav Yengera,et al.  Less is More: Surgical Phase Recognition with Less Annotations through Self-Supervised Pre-training of CNN-LSTM Networks , 2018, ArXiv.