Static vs. Dynamic Content Descriptors for Video Retrieval in Laparoscopy

The domain of minimally invasive surgery has recently attracted attention from the Multimedia community due to the fact that systematic video documentation is on the rise in this medical field. The vastly growing volumes of video archives demand for effective and efficient techniques to retrieve specific information from large video collections with visually very homogeneous content. One specific challenge in this context is to retrieve scenes showing similar surgical actions, i.e., similarity search. Although this task has a high and constantly growing relevance for surgeons and other health professionals, it has rarely been investigated in the literature so far for this particular domain. In this paper, we propose and evaluate a number of both static and dynamic content descriptors for this purpose. The former only take into account individual images, while the latter consider the motion within a scene. Our experimental results show that although static descriptors achieve the highest overall performance, dynamic descriptors are much more discriminative for certain classes of surgical actions. We conclude that the two approaches have complementary strengths and further research should investigate methods to combine them.

[1]  Mathias Lux,et al.  Endoscopic Video Retrieval: A Signature-Based Approach for Linking Endoscopic Images with Video Segments , 2015, 2015 IEEE International Symposium on Multimedia (ISM).

[2]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[3]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[4]  Mathias Lux,et al.  Visual information retrieval in endoscopic video archives , 2015, 2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI).

[5]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[6]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Nicu Sebe,et al.  Histograms of Motion Gradients for real-time video classification , 2016, 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI).

[9]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[11]  Thomas Seidl,et al.  Signature matching distance for content-based image retrieval , 2013, ICMR.

[12]  Heinrich Husslein,et al.  The Generic Error Rating Tool: A Novel Approach to Assessment of Performance and Surgical Education in Gynecologic Laparoscopy. , 2015, Journal of surgical education.

[13]  Klaus Schöffmann,et al.  Content-based processing and analysis of endoscopic images and videos: A survey , 2017, Multimedia Tools and Applications.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[16]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Thomas Seidl,et al.  Indexing the signature quadratic form distance for efficient content-based multimedia retrieval , 2011, ICMR.

[18]  Klaus Schöffmann,et al.  Learning laparoscopic video shot classification for gynecological surgery , 2018, Multimedia Tools and Applications.

[19]  Frédéric Jurie,et al.  Modeling spatial layout with fisher vectors for image categorization , 2011, 2011 International Conference on Computer Vision.

[20]  Yiannis S. Boutalis,et al.  CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval , 2008, ICVS.

[21]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[22]  Klaus Schöffmann,et al.  Domain-Specific Video Compression for Long-Term Archiving of Endoscopic Surgery Videos , 2016, 2016 IEEE 29th International Symposium on Computer-Based Medical Systems (CBMS).

[23]  Mathias Lux,et al.  Content-based retrieval in videos from laparoscopic surgery , 2016, SPIE Medical Imaging.

[24]  Nicu Sebe,et al.  Video classification with Densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off , 2015, International Journal of Multimedia Information Retrieval.

[25]  Klaus Schöffmann,et al.  Large-Scale Endoscopic Image and Video Linking with Gradient-Based Signatures , 2017, 2017 IEEE Third International Conference on Multimedia Big Data (BigMM).

[26]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).