Learning Semantic and Visual Similarity for Endomicroscopy Video Retrieval

Content-based image retrieval (CBIR) is a valuable computer vision technique which is increasingly being applied in the medical community for diagnosis support. However, traditional CBIR systems only deliver visual outputs, i.e., images having a similar appearance to the query, which is not directly interpretable by the physicians. Our objective is to provide a system for endomicroscopy video retrieval which delivers both visual and semantic outputs that are consistent with each other. In a previous study, we developed an adapted bag-of-visual-words method for endomicroscopy retrieval, called “Dense-Sift,” that computes a visual signature for each video. In this paper, we present a novel approach to complement visual similarity learning with semantic knowledge extraction, in the field of in vivo endomicroscopy. We first leverage a semantic ground truth based on eight binary concepts, in order to transform these visual signatures into semantic signatures that reflect how much the presence of each semantic concept is expressed by the visual words describing the videos. Using cross-validation, we demonstrate that, in terms of semantic detection, our intuitive Fisher-based method transforming visual-word histograms into semantic estimations outperforms support vector machine (SVM) methods with statistical significance. In a second step, we propose to improve retrieval relevance by learning an adjusted similarity distance from a perceived similarity ground truth. As a result, our distance learning method allows to statistically improve the correlation with the perceived similarity. We also demonstrate that, in terms of perceived similarity, the recall performance of the semantic signatures is close to that of visual signatures and significantly better than those of several state-of-the-art CBIR methods. The semantic signatures are thus able to communicate high-level medical knowledge while being consistent with the low-level visual signatures and much shorter than them. In our resulting retrieval system, we decide to use visual signatures for perceived similarity learning and retrieval, and semantic signatures for the output of an additional information, expressed in the endoscopist own language, which provides a relevant semantic translation of the visual retrieval outputs.

[1]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[2]  Michael B. Wallace,et al.  Su1493 Comparison Between Video and Mosaic Viewing Mode of Confocal Laser Endomicroscopy (pCLE) in Patients With Barrett's Esophagus , 2011 .

[3]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[4]  Fabio A. González,et al.  Combining visual features and text data for medical image retrieval using latent semantic kernels , 2010, MIR '10.

[5]  Nicholas Ayache,et al.  Robust mosaicing with correction of motion distortions and tissue deformations for in vivo fibered microscopy , 2006, Medical Image Anal..

[6]  Omer Khalid,et al.  Reinterpretation of histology of proximal colon polyps called hyperplastic in 2001. , 2009, World journal of gastroenterology.

[7]  Michael B. Wallace,et al.  Comparison of probe-based confocal laser endomicroscopy with virtual chromoendoscopy for classification of colon polyps. , 2010, Gastroenterology.

[8]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Vic Barnett,et al.  Sample Survey Principles and Methods , 1991 .

[10]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[11]  Michael B. Wallace,et al.  P.1.115: COMPARISON BETWEEN VIDEO AND MOSAIC VIEWING MODE OF CONFOCAL LASER ENDOMICROSCOPY (PCLE) IN PATIENTS WITH BARRETT'S ESOPHAGUS , 2011 .

[12]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[13]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[14]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[15]  M.,et al.  Statistical and Structural Approaches to Texture , 2022 .

[16]  Nicholas Ayache,et al.  An Image Retrieval Approach to Setup Difficulty Levels in Training Systems for Endomicroscopy Diagnosis , 2010, MICCAI.

[17]  Nicholas Ayache,et al.  Retrieval Evaluation and Distance Learning from Perceived Similarity between Endomicroscopy Videos , 2011, MICCAI.

[18]  A. Polglase,et al.  Confocal laser endoscopy for diagnosing intraepithelial neoplasias and colorectal cancer in vivo. , 2004, Gastroenterology.

[19]  Hayit Greenspan,et al.  Content-Based Image Retrieval in Radiology: Current Status and Future Directions , 2010, Journal of Digital Imaging.

[20]  Henning Müller,et al.  Overview of the ImageCLEFmed 2008 Medical Image Retrieval Task , 2008, CLEF.

[21]  Hayit Greenspan,et al.  X-ray Categorization and Retrieval on the Organ and Pathology Level, Using Patch-Based Visual Words , 2011, IEEE Transactions on Medical Imaging.

[22]  Michael R. Lyu,et al.  Bridging the Semantic Gap Between Image Contents and Tags , 2010, IEEE Transactions on Multimedia.

[23]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[25]  Xiao-Li Meng,et al.  Comparing correlated correlation coefficients , 1992 .

[26]  Michael Isard,et al.  Descriptor Learning for Efficient Retrieval , 2010, ECCV.

[27]  Nicholas Ayache,et al.  Endomicroscopic video retrieval using mosaicing and visualwords , 2010, 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[28]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[29]  Rong Jin,et al.  A Boosting Framework for Visuality-Preserving Distance Metric Learning and Its Application to Medical Image Retrieval , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Edward Ellsberg On the Bottom , 1929 .

[31]  Benjamin Bustos,et al.  Visual-semantic graphs: using queries to reduce the semantic gap in web image retrieval , 2010, CIKM.

[32]  Nuno Vasconcelos,et al.  Learning Pit Pattern Concepts for Gastroenterological Training , 2011, MICCAI.