Effective Metadata Discovery for Dynamic Filtering of Queries to a Radiology Image Search Engine

We sought to demonstrate the effectiveness of techniques to index radiology images using metadata discovered in their free-text figure captions. The ARRS GoldMiner™ image library incorporated 94,256 figures from 11,712 articles published in peer-reviewed online radiology journals. Algorithms were developed to discover metadata—age, sex, and imaging modality—from the figures’ free-text captions. Age was recorded in years, and was classified as infant (less than 2 years), child (2 to 17 years), or adult (18+ years). Each figure was assigned to one of eight imaging modalities. A random sample of 1,000 images was examined to measure accuracy of the metadata. The patient’s age was identified in 58,994 cases (63%), and the patient’s sex was identified in 58,427 cases (62%). An imaging modality was assigned to 80,402 (85%) of the figures. Based on the 1,000 sampled cases, recall values for age, sex, and imaging modality were 97.2%, 99.7%, and 86.4%, respectively. Precision values for age, sex, and imaging modality were 100%, 100%, and 97.2%, respectively. Automated techniques can accurately discover age, sex, and imaging modality metadata from captions of figures published in radiology journals. The metadata can be used to dynamically filter queries for an image search engine.