Proposal and Assessment of a De-Identification Strategy to Enhance Anonymity of the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) in a Public Cloud-Computing Environment: Anonymization of Medical Data Using Privacy Models

Background De-identifying personal information is critical when using personal health data for secondary research. The Observational Medical Outcomes Partnership Common Data Model (CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics, has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. When analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy. Objective This study proposes and evaluates a de-identification strategy that is comprised of several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The proposed strategy was evaluated using the actual CDM database. Methods The CDM database used in this study was constructed by the Anam Hospital of Korea University. Analysis and evaluation were performed using the ARX anonymizing framework in combination with the k-anonymity, l-diversity, and t-closeness privacy models. Results The CDM database, which was constructed according to the rules established by Observational Health Data Sciences and Informatics, exhibited a low risk of re-identification: The highest re-identifiable record rate (11.3%) in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one “highest risk” value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the “source values” (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models but also the overall possibility of re-identification. Conclusions Our proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Khaled El Emam,et al.  Guide to the De-Identification of Personal Health Information , 2013 .

[4]  A. Meyer The Health Insurance Portability and Accountability Act. , 1997, Tennessee medicine : journal of the Tennessee Medical Association.

[5]  Fabian Prasser,et al.  Putting Statistical Disclosure Control into Practice: The ARX Data Anonymization Tool , 2015, Medical Data Privacy Handbook.

[6]  S. Papson “Model” , 1981 .

[7]  Barbara Pierscionek,et al.  Contextual Anonymization for Secondary Use of Big Data in Biomedical Research: Proposal for an Anonymization Matrix , 2018, JMIR medical informatics.

[8]  Anat Reiner Benaim,et al.  Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies , 2020, JMIR medical informatics.

[9]  Hans-Ulrich Prokosch,et al.  A scoping review of cloud computing in healthcare , 2015, BMC Medical Informatics and Decision Making.

[10]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Rebecca Herold,et al.  The practical guide to HIPAA privacy and security compliance , 2003 .

[12]  Rae Woong Park,et al.  Sharing Clinical Big Data While Protecting Confidentiality and Security: Observational Health Data Sciences and Informatics , 2017, Healthcare informatics research.

[13]  Cord Spreckelsen,et al.  Privacy-Preserving Record Grouping and Consent Management Based on a Public-Private Key Signature Scheme: Theoretical Analysis and Feasibility Study , 2019, Journal of medical Internet research.