Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks

Five billion people in the world lack access to quality surgical care. Surgeon skill varies dramatically, and many surgical patients suffer complications and avoidable harm. Improving surgical training and feedback would help to reduce the rate of complications—half of which have been shown to be preventable. To do this, it is essential to assess operative skill, a process that currently requires experts and is manual, time consuming, and subjective. In this work, we introduce an approach to automatically assess surgeon performance by tracking and analyzing tool movements in surgical videos, leveraging region-based convolutional neural networks. In order to study this problem, we also introduce a new dataset, m2cai16-tool-locations, which extends the m2cai16-tool dataset with spatial bounds of tools. While previous methods have addressed tool presence detection, ours is the first to not only detect presence but also spatially localize surgical tools in real-world laparoscopic surgical videos. We show that our method both effectively detects the spatial bounds of tools as well as significantly outperforms existing methods on tool presence detection. We further demonstrate the ability of our method to assess surgical quality through analysis of tool usage patterns, movement range, and economy of motion.

[1]  A. Gawande,et al.  The incidence and nature of surgical adverse events in Colorado and Utah in 1992. , 1999, Surgery.

[2]  R. Gibberd,et al.  Adverse events in surgical patients in Australia. , 2002, International journal for quality in health care : journal of the International Society for Quality in Health Care.

[3]  T. Osler,et al.  Complications in surgical patients. , 2002, Archives of surgery.

[4]  D. Rattner,et al.  CELTS: a clinically-based Computer Enhanced Laparoscopic Training System. , 2003, Studies in health technology and informatics.

[5]  Gregory D. Hager,et al.  Towards automatic skill evaluation: detection and segmentation of robot-assisted surgical motions. , 2006 .

[6]  W. Berry,et al.  An estimation of the global volume of surgery: a modelling strategy based on available data , 2008, The Lancet.

[7]  Stefanie Speidel,et al.  Automatic classification of minimally invasive instruments based on endoscopic image sequences , 2009, Medical Imaging.

[8]  Gregory D. Hager,et al.  Task versus Subtask Surgical Skill Evaluation of Robotic Minimally Invasive Surgery , 2009, MICCAI.

[9]  P. Allen,et al.  Articulated Surgical Tool Detection Using Virtually-Rendered Templates , 2012 .

[10]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[11]  Pierre Jannin,et al.  A Framework for the Recognition of High-Level Surgical Tasks From Video Images for Cataract Surgeries , 2012, IEEE Transactions on Biomedical Engineering.

[12]  J. Birkmeyer,et al.  Surgical skill and complication rates after bariatric surgery. , 2013, The New England journal of medicine.

[13]  H. Feußner,et al.  Real-time instrument detection in minimally invasive surgery using radiofrequency identification technology. , 2013, The Journal of surgical research.

[14]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  T. Weiser,et al.  Global access to surgical care: a modelling study. , 2015, The Lancet. Global health.

[17]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[18]  Gregory D. Hager,et al.  An Improved Model for Segmentation and Recognition of Fine-Grained Activities with Application to Surgical Training Tasks , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[19]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Anirban Mukhopadhyay,et al.  Tool and Phase recognition using contextual CNN features , 2016, ArXiv.

[22]  Q. Dou,et al.  EndoRCN : Recurrent Convolutional Networks for Recognition of Surgical Workflow in Cholecystectomy Procedure Video , 2016 .

[23]  Yachna Sharma,et al.  Automated video-based assessment of surgical skills for training and evaluation in medical schools , 2016, International Journal of Computer Assisted Radiology and Surgery.

[24]  Matthieu Cord,et al.  M2CAI Workflow Challenge: Convolutional Neural Networks with Time Smoothing and Hidden Markov Model for Video Frames Classification , 2016, ArXiv.

[25]  Nassir Navab,et al.  The TUM LapChole dataset for the M2CAI 2016 workflow challenge , 2016, ArXiv.

[26]  N. Padoy,et al.  Single-and MultiTask Architectures for Tool Presence Detection Challenge at M 2 CAI 2016 , 2016 .

[27]  Andru Putra Twinanda,et al.  Single- and Multi-Task Architectures for Tool Presence Detection Challenge at M2CAI 2016 , 2016, ArXiv.

[28]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[29]  Jason J. Corso,et al.  Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection , 2017, IEEE Transactions on Medical Imaging.