Content-based versus semantic-based retrieval: an LIDC case study

Content based image retrieval is an active area of medical imaging research. One use of content based image retrieval (CBIR) is presentation of known, reference images similar to an unknown case. These comparison images may reduce the radiologist's uncertainty in interpreting that case. It is, therefore, important to present radiologists with systems whose computed-similarity results correspond to human perceived-similarity. In our previous work, we developed an open-source CBIR system that inputs a computed tomography (CT) image of a lung nodule as a query and retrieves similar lung nodule images based on content-based image features. In this paper, we extend our previous work by studying the relationships between the two types of retrieval, content-based and semantic-based, with the final goal of integrating them into a system that will take advantage of both retrieval approaches. Our preliminary results on the Lung Image Database Consortium (LIDC) dataset using four types of image features, seven radiologists' rated semantic characteristics and two simple similarity measures show that a substantial number of nodules identified as similar based on image features are also identified as similar based on semantic characteristics. Furthermore, by integrating the two types of features, the similarity retrieval improves with respect to certain nodule characteristics.