Automated collection of medical images for research from heterogeneous systems: trials and tribulations

Radiological imaging is fundamental within the healthcare industry and has become routinely adopted for diagnosis, disease monitoring and treatment planning. Over the past two decades both diagnostic and therapeutic imaging have undergone a rapid growth, the ability to be able to harness this large influx of medical images can provide an essential resource for research and training. Traditionally, the systematic collection of medical images for research from heterogeneous sites has not been commonplace within the NHS and is fraught with challenges including; data acquisition, storage, secure transfer and correct anonymisation. Here, we describe a semi-automated system, which comprehensively oversees the collection of both unprocessed and processed medical images from acquisition to a centralised database. The provision of unprocessed images within our repository enables a multitude of potential research possibilities that utilise the images. Furthermore, we have developed systems and software to integrate these data with their associated clinical data and annotations providing a centralised dataset for research. Currently we regularly collect digital mammography images from two sites and partially collect from a further three, with efforts to expand into other modalities and sites currently ongoing. At present we have collected 34,014 2D images from 2623 individuals. In this paper we describe our medical image collection system for research and discuss the wide spectrum of challenges faced during the design and implementation of such systems.