Deep Neural Architecture for Localization and Tracking of Surgical Tools in Cataract Surgery

Over the last couple of decades, the quality of surgical interventions has improved owing to the use of computer vision and robotic assistance. One such application of computer vision, namely, detection of surgical tools in videos is gaining attention of the medical image processing community. The main motivation for detection, localization, and annotation of surgical tools is to develop applications for surgical wsorkflow analysis. Such an analysis can aid in report generation, real-time decision support, etc. Cataract surgery is one of the common surgical procedure where surgeons do have direct visual access to the surgical site. Extremely small tools are used for this procedure and the surgeons observe the surgical site through a surgical microscope. In such cases, detecting the presence of tools can act an additional aid to the surgeon as well as other surgical staffs. We propose a framework consisting of a Convolutional Neural Network (CNN) which learns to distinguish and detect the presence of various surgical tools by learning robust features from the frames of a surgical video. Various deep neural architectures are hence evaluated for the task of detecting tools. The baseline models used for the purpose are pretrained on Imagenet dataset and they render upto 50% prediction accuracy. All the experiments have been validated on the dataset released as part of the Cataracts Grand Challenge. A framework for localization and detection of tools has also been proposed, which is capable of extracting visual features from glimpses of an image, by adaptively selecting and processing only the selected regions at high resolution.

[1]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Gwénolé Quellec,et al.  Surgical tool detection in cataract surgery videos through multi-image fusion inside a convolutional neural network , 2017, 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[6]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.