论文信息 - Content-based retrieval of video segments from minimally invasive surgery videos using deep convolutional video descriptors and iterative query refinement

Content-based retrieval of video segments from minimally invasive surgery videos using deep convolutional video descriptors and iterative query refinement

Despite a strong evidence of the clinical and economic benefits of minimally invasive surgery (MIS) for many common surgical procedures, there is a gross underutilization of MIS in many US hospitals, potentially due to its steep learning curve. Intraoperative videos captured using a camera inserted into the body during MIS procedures are emerging as an invaluable resource for MIS education, skill assessment and quality assurance. However, these videos often have a duration of several hours and there is a pressing need for automated tools to help surgeons quickly find key semantic segments of interest within MIS videos. In this paper, we present a novel integrated approach for facilitating content-based retrieval of video segments that are semantically similar to a query video within a large collection of MIS videos. We use state-of-theart deep 3D convolutional neural network (CNN) models pre-trained on large public video classification datasets to extract spatiotemporal features from MIS video segments and employ an iterative query refinement (IQR) strategy where in a support vector machine (SVM) classifier trained online based on relevance feedback from the user is used to refine the search results iteratively. We show that our method outperforms the state-of-the-art on the SurgicalActions160 dataset containing 160 video clips of typical surgical actions in gynecologic MIS procedures.

[1] M. Soucisse,et al. Video Coaching as an Efficient Teaching Method for Surgical Residents-A Randomized Controlled Trial. , 2017, Journal of surgical education.

[2] Jianhua Zhao,et al. Probabilistic Principal Component Analysis for 2D data , 2011 .

[3] Klaus Schöffmann,et al. Content-based processing and analysis of endoscopic images and videos: A survey , 2017, Multimedia Tools and Applications.

[4] Klaus Schöffmann,et al. Video retrieval in laparoscopic video recordings with dynamic content descriptors , 2017, Multimedia Tools and Applications.

[5] Martin Aumüller,et al. ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms , 2018, SISAP.

[6] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Klaus Schöffmann,et al. Learning laparoscopic video shot classification for gynecological surgery , 2018, Multimedia Tools and Applications.

[8] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9] Nicu Sebe,et al. Histograms of Motion Gradients for real-time video classification , 2016, 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI).

[10] Gerald M. Fried,et al. Surgery through the keyhole: a new view of an old art , 2007, McGill journal of medicine : MJM : an international forum for the advancement of medical sciences by students.

[11] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12] Justin B Dimick,et al. Novel Uses of Video to Accelerate the Surgical Learning Curve. , 2016, Journal of laparoendoscopic & advanced surgical techniques. Part A.

[13] Edward Y. Chang,et al. Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[14] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[15] Thierry Pun,et al. Performance evaluation in content-based image retrieval: overview and proposals , 2001, Pattern Recognit. Lett..

[16] Hongwei Yao,et al. Future therapeutic treatment of COPD: Struggle between oxidants and cytokines , 2007, International journal of chronic obstructive pulmonary disease.

[17] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18] Mo Zhou,et al. Hospital cost implications of increased use of minimally invasive surgery. , 2015, JAMA surgery.

[19] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20] Constantinos Loukas,et al. Video content analysis of surgical procedures , 2018, Surgical Endoscopy.

[21] Mathias Lux,et al. Endoscopic Video Retrieval: A Signature-Based Approach for Linking Endoscopic Images with Video Segments , 2015, 2015 IEEE International Symposium on Multimedia (ISM).

[22] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[23] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24] Andru Putra Twinanda,et al. EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[25] Justin B Dimick,et al. Video-Based Surgical Coaching: An Emerging Approach to Performance Improvement. , 2016, JAMA surgery.

[26] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.

[27] Tej D. Azad,et al. Size and distribution of the global volume of surgery in 2012 , 2016, Bulletin of the World Health Organization.

[28] Susan Hutfless,et al. Hospital level under-utilization of minimally invasive surgery in the United States: retrospective review , 2014, BMJ : British Medical Journal.

[29] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.