Pseudonymisation of radiology data for research purposes

Medical image processing methods and algorithms, developed by researchers, need to be validated and tested. Test data should ideally be real clinical data especially when that clinical data is varied and exists in large volume. In nowadays, clinical data is accessible electronically and has important value for researchers. However, the usage of clinical data for research purposes should respect data confidentiality, patient right to privacy and the patient consent. In fact, clinical data is nominative given that it contains information about the patient such as name, age and identification number. Evidently, clinical data should be de-identified to be exported to research databases. However, the same patient is usually followed during a long period of time. The disease progression and the diagnostic evolution represent extremely valuable information for researchers, as well. Our objective is to build a research database from de-identified clinical data while enabling the database to be easily incremented by exporting new pseudonymous data, acquired over a long period of time. Pseudonymisation is data de-identification such that data belonging to the same individual in the clinical environment bear the same relation to each other in the de-identified research version. In this paper, we propose a software architecture that enables the implementation of a research database that can be incremented in time. We also evaluate its security and discuss its security pitfalls.