Surgical video retrieval using deep neural networks

Although the amount of raw surgical videos, namely videos captured during surgical interventions, is growing fast, automatic retrieval and search remains a challenge. This is mainly due to the nature of the content, i.e. visually non-consistent tissue, diversity of internal organs, abrupt viewpoint changes and illumination variation. We propose a framework for retrieving surgical videos and a protocol for evaluating the results. The method is composed of temporal shot segmentation and representation based on deep features, and the protocol introduces novel criteria to the field. The experimental results prove the superiority of the proposed method and highlight the path towards a more effective protocol for evaluating surgical videos.

[1]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[2]  Cordelia Schmid,et al.  Circulant Temporal Encoding for Video Retrieval and Temporal Alignment , 2015, International Journal of Computer Vision.

[3]  Klaus Schöffmann,et al.  Segmentation of recorded endoscopic videos by detecting significant motion changes , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[4]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Parham Aarabi,et al.  Tiny Videos: A Large Data Set for Nonparametric Video Retrieval and Frame Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Shang-Hong Lai,et al.  Fusing generic objectness and visual saliency for salient object detection , 2011, 2011 International Conference on Computer Vision.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Ioannis D. Schizas,et al.  Shot boundary detection in endoscopic surgery videos using a variational Bayesian framework , 2016, International Journal of Computer Assisted Radiology and Surgery.

[9]  Guang-Zhong Yang,et al.  Episode Classification for the Analysis of Tissue/Instrument Interaction with Multiple Visual Cues , 2003, MICCAI.

[10]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[11]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[12]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[13]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Andru Putra Twinanda,et al.  EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos , 2016, IEEE Transactions on Medical Imaging.

[15]  Patrick Bouthemy,et al.  Action Localization with Tubelets from Motion , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Nassir Navab,et al.  Modeling and Segmentation of Surgical Workflow from Laparoscopic Video , 2010, MICCAI.

[17]  Antoine Geissbühler,et al.  A Review of Content{Based Image Retrieval Systems in Medical Applications { Clinical Bene(cid:12)ts and Future Directions , 2022 .

[18]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[19]  Hiroshi Kawakami,et al.  Detection of Planar Regions with Uncalibrated Stereo using Distributions of Feature Points , 2004, BMVC.

[20]  Constantinos Loukas,et al.  Smoke detection in endoscopic surgery videos: a first step towards retrieval of semantic events , 2015, The international journal of medical robotics + computer assisted surgery : MRCAS.

[21]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[22]  Klaus Schöffmann,et al.  Instrument classification in laparoscopic videos , 2015, 2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI).

[23]  Guang-Zhong Yang,et al.  Content-Based Surgical Workflow Representation Using Probabilistic Motion Modeling , 2010, MIAR.

[24]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[25]  Santiago Manen,et al.  Online Video SEEDS for Temporal Window Objectness , 2013, 2013 IEEE International Conference on Computer Vision.

[26]  É. Vincent,et al.  Detecting planar homographies in an image pair , 2001, ISPA 2001. Proceedings of the 2nd International Symposium on Image and Signal Processing and Analysis. In conjunction with 23rd International Conference on Information Technology Interfaces (IEEE Cat..

[27]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30]  Rainer Lienhart,et al.  Comparison of automatic shot boundary detection algorithms , 1998, Electronic Imaging.

[31]  Jitendra Malik,et al.  Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.