Binary convolutional neural network features off-the-shelf for image to video linking in endoscopic multimedia databases

With a rigorous long-term archival of endoscopic surgeries, vast amounts of video and image data accumulate. Surgeons are not able to spend their valuable time to manually search within endoscopic multimedia databases (EMDBs) or manually maintain links to interesting sections in order to quickly retrieve relevant surgery sections. Enabling the surgeons to quickly access the relevant surgery scenes, we utilize the fact that surgeons record external images additionally to the surgery video and aim to link them to the appropriate video sequence in the EMDB using a query-by-example approach. We propose binary Convolutional Neural Network (CNN) features off-the-shelf and compare them to several baselines: pixel-based comparison (PSNR), image structure comparison (SSIM), hand-crafted global features (CEDD and feature signatures), as well as CNN baselines Histograms of Class Confidences (HoCC) and Neural Codes (NC). For evaluation, we use 5.5 h of endoscopic video material and 69 query images selected by medical experts and compare the performance of the aforementioned image mathing methods in terms of video hit rate and distance to the true playback time stamp (PTS) for correct video predictions. Our evaluation shows that binary CNN features are compact, yet powerful image descriptors for retrieval in the endoscopic imaging domain. They are able to maintain state-of-the-art performance, while providing the benefit of low storage space requirements and hence provide the best compromise.

[1]  Pascale Sébillot,et al.  Exploiting Multimodality in Video Hyperlinking to Improve Target Diversity , 2017, MMM.

[2]  Jie Lin,et al.  A practical guide to CNNs and Fisher Vectors for image instance retrieval , 2015, Signal Process..

[3]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Michael Riegler,et al.  Medical Multimedia Information Systems (MMIS) , 2017, ACM Multimedia.

[5]  Mathias Lux,et al.  Visual information retrieval in endoscopic video archives , 2015, 2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI).

[6]  Maria Eskevich,et al.  The Search and Hyperlinking Task at MediaEval 2013 , 2013, MediaEval.

[7]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[8]  Jakub Lokoc,et al.  Ptolemaic indexing of the signature quadratic form distance , 2011, SISAP.

[9]  Klaus Schöffmann,et al.  Relevance Segmentation of Laparoscopic Videos , 2013, 2013 IEEE International Symposium on Multimedia.

[10]  Yiannis S. Boutalis,et al.  Searching images with MPEG-7 (& MPEG-7-like) Powered Localized dEscriptors: The SIMPLE answer to effective Content Based Image Retrieval , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Andrew Zisserman,et al.  Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[13]  Guillaume Gravier,et al.  IRISA at TrecVid 2015: Leveraging Multimodal LDA for Video Hyperlinking , 2015, TRECVID.

[14]  Mathias Lux,et al.  Content-based retrieval in videos from laparoscopic surgery , 2016, SPIE Medical Imaging.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jonathan G. Fiscus,et al.  TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking , 2016, TRECVID.

[17]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[18]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[19]  Iraj Sodagar,et al.  The MPEG-DASH Standard for Multimedia Streaming Over the Internet , 2011, IEEE MultiMedia.

[20]  Mathias Lux,et al.  Lire: lucene image retrieval: an extensible java CBIR library , 2008, ACM Multimedia.

[21]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[22]  Guillaume Gravier,et al.  Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking , 2016, iV&L-MM@MM.

[23]  Klaus Schöffmann,et al.  Learning laparoscopic video shot classification for gynecological surgery , 2018, Multimedia Tools and Applications.

[24]  Guillaume Gravier,et al.  Bidirectional Joint Representation Learning with Symmetrical Deep Neural Networks for Multimodal and Crossmodal Applications , 2016, ICMR.

[25]  Mathias Lux,et al.  Endoscopic Video Retrieval: A Signature-Based Approach for Linking Endoscopic Images with Video Segments , 2015, 2015 IEEE International Symposium on Multimedia (ISM).

[26]  Jiri Matas,et al.  Visual Descriptors in Methods for Video Hyperlinking , 2017, ICMR.

[27]  Yiannis S. Boutalis,et al.  CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval , 2008, ICVS.

[28]  Alberto Del Bimbo,et al.  Compact Hash Codes for Efficient Visual Descriptors Retrieval in Large Scale Databases , 2016, IEEE Transactions on Multimedia.

[29]  Alexander G. Hauptmann,et al.  CMU-SMU@TRECVID 2015: Video Hyperlinking , 2015 .

[30]  Mathias Lux,et al.  Dimensionality Reduction for Image Features using Deep Learning and Autoencoders , 2017, CBMI.

[31]  Guillaume Gravier,et al.  Generative Adversarial Networks for Multimodal Representation Learning in Video Hyperlinking , 2017, ICMR.

[32]  Jianmin Li,et al.  CNN Based Hashing for Image Retrieval , 2015, ArXiv.

[33]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.