Service for the Pseudonymization of Electronic Healthcare Records Based on ISO/EN 13606 for the Secondary Use of Information

The availability of electronic health data favors scientific advance through the creation of repositories for secondary use. Data anonymization is a mandatory step to comply with current legislation. A service for the pseudonymization of electronic healthcare record (EHR) extracts aimed at facilitating the exchange of clinical information for secondary use in compliance with legislation on data protection is presented. According to ISO/TS 25237, pseudonymization is a particular type of anonymization. This tool performs the anonymizations by maintaining three quasi-identifiers (gender, date of birth, and place of residence) with a degree of specification selected by the user. The developed system is based on the ISO/EN 13606 norm using its characteristics specifically favorable for anonymization. The service is made up of two independent modules: the demographic server and the pseudonymizing module. The demographic server supports the permanent storage of the demographic entities and the management of the identifiers. The pseudonymizing module anonymizes the ISO/EN 13606 extracts. The pseudonymizing process consists of four phases: the storage of the demographic information included in the extract, the substitution of the identifiers, the elimination of the demographic information of the extract, and the elimination of key data in free-text fields. The described pseudonymizing system was used in three telemedicine research projects with satisfactory results. A problem was detected with the type of data in a demographic data field and a proposal for modification was prepared for the group in charge of the drawing up and revision of the ISO/EN 13606 norm.

[1]  L. Sweeney Simple Demographics Often Identify People Uniquely , 2000 .

[2]  Ian H. Witten,et al.  Data Mining: Practical Machine Learning Tools and Techniques, 3/E , 2014 .

[3]  S E Fienberg,et al.  Sharing statistical data in the biomedical and health sciences: ethical, institutional, legal, and professional dimensions. , 1994, Annual review of public health.

[4]  Jean Ruelle,et al.  RegaDB: community-driven data management and analysis for infectious diseases , 2013, Bioinform..

[5]  Ian Witten,et al.  Data Mining , 2000 .

[6]  D Kalra,et al.  ISO 13606 Electronic Health Record Communication Part 1: Reference Model [99 pages] , 2008 .

[7]  D. Kalra,et al.  EN 13606 Health informatics - Electronic health record communication - Part 1: Reference model , 2007 .

[8]  Doheon Lee,et al.  An Approach to Reducing Information Loss and Achieving Diversity of Sensitive Attributes in k-anonymity Methods , 2012, Interactive journal of medical research.

[9]  Henning Müller,et al.  Strategies for health data exchange for secondary, cross-institutional clinical research , 2010, Comput. Methods Programs Biomed..

[10]  Ricardo Sánchez-de-Madariaga,et al.  ccML, a new mark-up language to improve ISO/EN 13606-based electronic health record extracts practical edition , 2013, J. Am. Medical Informatics Assoc..

[11]  Bradley Malin,et al.  Evaluating re-identification risks with respect to the HIPAA privacy rule , 2010, J. Am. Medical Informatics Assoc..

[12]  S. Meystre,et al.  Automatic de-identification of textual documents in the electronic health record: a review of recent research , 2010, BMC medical research methodology.

[13]  T Beale,et al.  Archetypes: Constraint-based Domain Models for Future-proof Information Systems , 2000, OOPSLA 2000.

[14]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[15]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[16]  Igor V. Filippov,et al.  Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on , 2011, J. Cheminformatics.

[17]  George Hripcsak,et al.  Using EHRs to integrate research with patient care: promises and challenges , 2012, J. Am. Medical Informatics Assoc..

[18]  Heather A. Piwowar,et al.  Sharing Detailed Research Data Is Associated with Increased Citation Rate , 2007, PloS one.

[19]  Luk Arbuckle,et al.  El Emam Et Al.: the De‐identification of the Heritage Health Prize Claims Data Set Multimedia Appendix Multimedia Appendix 1 Truncation of Claims 2 Removal of High Risk Patients , 2022 .