A comparative analysis of multi-backbone Mask R-CNN for surgical tools detection

Real-time surgical tool segmentation and tracking based on convolutional neural networks (CNN) has gained increasing interest in the field of mini-invasive surgery. In fact, the application of this novel artificial vision technologies allows both to reduce surgical risks and to increase patient safety. Moreover, these types of models can be used both to track the tools and detect markers or external artefacts in a real-time video stream. Multiple object detection and instance segmentation can be addressed efficiently by leveraging region-based CNN models. Thus, this work provides a comparison among state-of-the-art multi-backbone Mask R-CNNs to solve these tasks. Moreover, we show that such models can serve as a basis for tracking algorithms. The models were trained and tested with a data-set of 4955 manually annotated images, validated by 3 experts in the field. We tested 12 different combinations of CNN backbones and training hyperparameters. The results show that it is possible to employ a modern CNN to tackle the surgical tool detection problem, with the best-performing Mask R-CNN configuration achieving 87% Average Precision (AP) at Intersection over Union (IOU) 0.5.

[1]  Sanja Fidler,et al.  Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++ , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Nadia Mammone,et al.  A deep CNN approach to decode motor preparation of upper limbs from time-frequency maps of EEG signals at source level , 2020, Neural Networks.

[3]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[4]  Cedric Nishan Canagarajah,et al.  Structural Similarity-Based Object Tracking in Video Sequences , 2006, 2006 9th International Conference on Information Fusion.

[5]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Yaroslav Bulatov,et al.  Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[7]  Sharib Ali,et al.  Endoscopy artifact detection (EAD 2019) challenge dataset , 2019, ArXiv.

[8]  Chao Yang,et al.  A Survey on Deep Transfer Learning , 2018, ICANN.

[9]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[10]  Yi Li,et al.  Fully Convolutional Instance-Aware Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Luc Soler,et al.  The status of augmented reality in laparoscopic surgery as of 2016 , 2017, Medical Image Anal..

[12]  Rui Cao,et al.  Epileptic Seizure Detection Based on EEG Signals and CNN , 2018, Front. Neuroinform..

[13]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[14]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[15]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Anne E Carpenter,et al.  Applying Faster R-CNN for Object Detection on Malaria Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Francesco Carlo Morabito,et al.  A Convolutional Neural Network approach for classification of dementia stages based on 2D-spectral representation of EEG recordings , 2019, Neurocomputing.

[19]  Zhiyu Chen,et al.  Mask Editor : an Image Annotation Tool for Image Segmentation Tasks , 2018, ArXiv.

[20]  Abhishek Dutta,et al.  The VIA Annotation Software for Images, Audio and Video , 2019, ACM Multimedia.

[21]  Michael T. Manry,et al.  Minimizing validation error with respect to network size and number of training epochs , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[22]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[23]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Deng Cai,et al.  Correlation Maximized Structural Similarity Loss for Semantic Segmentation , 2019, ArXiv.

[26]  Hiromi T. Tanaka,et al.  Method in collision detection and interaction between rigid surgical tools and deformable organs , 2014, SIGGRAPH ASIA Posters.

[27]  Jeonghwan Gwak,et al.  Ensemble of Instance Segmentation Models for Polyp Segmentation in Colonoscopy Images , 2019, IEEE Access.

[28]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[29]  Jaesoon Choi,et al.  Surgical-tools detection based on Convolutional Neural Network in laparoscopic robot-assisted surgery , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[30]  Jiebo Luo,et al.  Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark , 2016, AAAI.

[31]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[32]  Parisa Rashidi,et al.  Autonomous Detection of Disruptions in the Intensive Care Unit Using Deep Mask R-CNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Ilangko Balasingham,et al.  Automatic Colon Polyp Detection Using Region Based Deep CNN and Post Learning Approaches , 2018, IEEE Access.

[34]  Tolga Tasdizen,et al.  Image Segmentation with Cascaded Hierarchical Models and Logistic Disjunctive Normal Networks , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Yihong Gong,et al.  Deep Learning with Kernel Regularization for Visual Recognition , 2008, NIPS.

[36]  D. Oleynikov,et al.  Real-time three-dimensional soft tissue reconstruction for laparoscopic surgery , 2012, Surgical Endoscopy.

[37]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Quan Wang,et al.  An Efficient Approach for Polyps Detection in Endoscopic Videos Based on Faster R-CNN , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[40]  Sharib Ali,et al.  A deep learning framework for quality assessment and restoration in video endoscopy , 2019, Medical Image Anal..

[41]  Ilias Maglogiannis,et al.  Artificial Neural Networks and Machine Learning – ICANN 2018 , 2018, Lecture Notes in Computer Science.