XAI-CBIR: Explainable AI System for Content based Retrieval of Video Frames from Minimally Invasive Surgery Videos

In this paper, we present a human-in-the-loop explainable AI (XAI) system for content based image retrieval (CBIR) of video frames similar to a query image from minimally invasive surgery (MIS) videos for surgical education. It extracts semantic descriptors from MIS video frames using a self-supervised deep learning model. It then employs an iterative query refinement strategy where in a binary classifier trained online based on relevance feedback from the user is used to iteratively refine the search results. Lastly, it uses an XAI technique to generate a saliency map that provides a visual explanation of why the system considers a retrieved image to be similar to the query image. We evaluated the proposed XAI-CBIR system on the public Cholec80 dataset containing 80 videos of minimally invasive cholecystectomy surgeries with encouraging results.

[1]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[2]  Mo Zhou,et al.  Hospital cost implications of increased use of minimally invasive surgery. , 2015, JAMA surgery.

[3]  Justin B Dimick,et al.  Novel Uses of Video to Accelerate the Surgical Learning Curve. , 2016, Journal of laparoendoscopic & advanced surgical techniques. Part A.

[4]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[5]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jonathan Krause,et al.  Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Susan Hutfless,et al.  Hospital level under-utilization of minimally invasive surgery in the United States: retrospective review , 2014, BMJ : British Medical Journal.

[8]  Andrew Zisserman,et al.  Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Klaus Schöffmann,et al.  Content-based processing and analysis of endoscopic images and videos: A survey , 2017, Multimedia Tools and Applications.

[10]  Klaus Schöffmann,et al.  Learning laparoscopic video shot classification for gynecological surgery , 2018, Multimedia Tools and Applications.

[11]  Constantinos Loukas,et al.  Video content analysis of surgical procedures , 2018, Surgical Endoscopy.

[12]  Kristen Grauman,et al.  Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Sebastian Bodenstedt,et al.  Temporal coherence-based self-supervised learning for laparoscopic workflow analysis , 2018, OR 2.0/CARE/CLIP/ISIC@MICCAI.

[14]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[15]  Justin B Dimick,et al.  Video-Based Surgical Coaching: An Emerging Approach to Performance Improvement. , 2016, JAMA surgery.