Indexing Large Archives of Pathology Images Using the Unified Medical Language System (UMLS)

The value of any large image archive resides in the ability to select and retrieve images based on features of interest contained in the images. We here show that images can be automatically encoded from descriptive text (image-legends), into concept codes of the Unified Medical Language System (UMLS), a technique that permits powerful image categorization and retrieval, and is generalizable to image archives of any size. A collection of 5,465 pathology image legends was encoded into UMLS terms, via our computer translation program that parses and maps plain-text image-legends into lists of UMLS terms. Each image-legend yielded an average of 15 indexterms, ranging in frequency from five terms in the least-indexed legend to 58 terms in the most-indexed legend. The resulting UMLS index can be used to retrieve images, even when a chosen query term is not included in the image legend.