论文信息 - XAI-CBIR: Explainable AI System for Content based Retrieval of Video Frames from Minimally Invasive Surgery Videos

XAI-CBIR: Explainable AI System for Content based Retrieval of Video Frames from Minimally Invasive Surgery Videos

In this paper, we present a human-in-the-loop explainable AI (XAI) system for content based image retrieval (CBIR) of video frames similar to a query image from minimally invasive surgery (MIS) videos for surgical education. It extracts semantic descriptors from MIS video frames using a self-supervised deep learning model. It then employs an iterative query refinement strategy where in a binary classifier trained online based on relevance feedback from the user is used to iteratively refine the search results. Lastly, it uses an XAI technique to generate a saliency map that provides a visual explanation of why the system considers a retrieved image to be similar to the query image. We evaluated the proposed XAI-CBIR system on the public Cholec80 dataset containing 80 videos of minimally invasive cholecystectomy surgeries with encouraging results.

[1] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[2] Mo Zhou,et al. Hospital cost implications of increased use of minimally invasive surgery. , 2015, JAMA surgery.

[3] Justin B Dimick,et al. Novel Uses of Video to Accelerate the Surgical Learning Curve. , 2016, Journal of laparoendoscopic & advanced surgical techniques. Part A.

[4] Edward Y. Chang,et al. Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[5] Jason Yosinski,et al. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Jonathan Krause,et al. Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7] Susan Hutfless,et al. Hospital level under-utilization of minimally invasive surgery in the United States: retrospective review , 2014, BMJ : British Medical Journal.

[8] Andrew Zisserman,et al. Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9] Klaus Schöffmann,et al. Content-based processing and analysis of endoscopic images and videos: A survey , 2017, Multimedia Tools and Applications.

[10] Klaus Schöffmann,et al. Learning laparoscopic video shot classification for gynecological surgery , 2018, Multimedia Tools and Applications.

[11] Constantinos Loukas,et al. Video content analysis of surgical procedures , 2018, Surgical Endoscopy.

[12] Kristen Grauman,et al. Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Sebastian Bodenstedt,et al. Temporal coherence-based self-supervised learning for laparoscopic workflow analysis , 2018, OR 2.0/CARE/CLIP/ISIC@MICCAI.

[14] Andru Putra Twinanda,et al. EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[15] Justin B Dimick,et al. Video-Based Surgical Coaching: An Emerging Approach to Performance Improvement. , 2016, JAMA surgery.