Learning Latent Temporal Connectionism of Deep Residual Visual Abstractions for Identifying Surgical Tools in Laparoscopy Procedures

Surgical workflow in minimally invasive interventions like laparoscopy can be modeled with the aid of tool usage information. The video stream available during surgery primarily for viewing the surgical site using an endoscope can be leveraged for this purpose without the need for additional sensors or instruments. We propose a method which learns to detect the tool presence in laparoscopy videos by leveraging the temporal connectionist information in a systematically executed surgical procedures by learning the long and short order relationships between higher abstractions of the spatial visual features extracted from the surgical video. We propose a framework consisting of using Convolutional Neural Networks for extracting the visual features and Long Short-Term Memory network to encode the temporal information. The proposed framework has been experimentally verified using a publicly available dataset consisting of 10 training and 5 testing annotated videos to obtain an average accuracy of 88:75% in detecting the tools present.

[1]  Luc Soler,et al.  Autonomous 3-D positioning of surgical instruments in robotized laparoscopic surgery using visual servoing , 2003, IEEE Trans. Robotics Autom..

[2]  Sebastian Bodenstedt,et al.  Visual tracking of da Vinci instruments for laparoscopic surgery , 2014, Medical Imaging.

[3]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yasuhiro Fukui,et al.  Development of automatic acquisition system of surgical-instrument informantion in endoscopic and laparoscopic surgey , 2009, 2009 4th IEEE Conference on Industrial Electronics and Applications.

[7]  Jaesoon Choi,et al.  Endoscopic vision based tracking of multiple surgical instruments in robot-assisted surgery , 2012, 2012 12th International Conference on Control, Automation and Systems.

[8]  Paolo Dario,et al.  Tracking endoscopic instruments without a localizer: A shape-analysis-based approach , 2007, Computer aided surgery : official journal of the International Society for Computer Aided Surgery.

[9]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Roland Memisevic,et al.  Regularizing RNNs by Stabilizing Activations , 2015, ICLR.

[13]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  P. Allen,et al.  Articulated Surgical Tool Detection Using Virtually-Rendered Templates , 2012 .

[15]  Pascal Fua,et al.  Fast Part-Based Classification for Instrument Detection in Minimally Invasive Surgery , 2014, MICCAI.

[16]  Josep Amat,et al.  Automatic guidance of an assistant robot in laparoscopic surgery , 1996, Proceedings of IEEE International Conference on Robotics and Automation.