Design and implementation of a privacy preserving electronic health record linkage tool in Chicago

OBJECTIVE To design and implement a tool that creates a secure, privacy preserving linkage of electronic health record (EHR) data across multiple sites in a large metropolitan area in the United States (Chicago, IL), for use in clinical research. METHODS The authors developed and distributed a software application that performs standardized data cleaning, preprocessing, and hashing of patient identifiers to remove all protected health information. The application creates seeded hash code combinations of patient identifiers using a Health Insurance Portability and Accountability Act compliant SHA-512 algorithm that minimizes re-identification risk. The authors subsequently linked individual records using a central honest broker with an algorithm that assigns weights to hash combinations in order to generate high specificity matches. RESULTS The software application successfully linked and de-duplicated 7 million records across 6 institutions, resulting in a cohort of 5 million unique records. Using a manually reconciled set of 11 292 patients as a gold standard, the software achieved a sensitivity of 96% and a specificity of 100%, with a majority of the missed matches accounted for by patients with both a missing social security number and last name change. Using 3 disease examples, it is demonstrated that the software can reduce duplication of patient records across sites by as much as 28%. CONCLUSIONS Software that standardizes the assignment of a unique seeded hash identifier merged through an agreed upon third-party honest broker can enable large-scale secure linkage of EHR data for epidemiologic and public health research. The software algorithm can improve future epidemiologic research by providing more comprehensive data given that patients may make use of multiple healthcare systems.

[1]  Murat Kantarcioglu,et al.  Composite Bloom Filters for Secure Record Linkage , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Sean M. Randall,et al.  Privacy-preserving record linkage on large real world datasets , 2014, J. Biomed. Informatics.

[3]  David Levine,et al.  CAPriCORN: Chicago Area Patient-Centered Outcomes Research Network , 2014, J. Am. Medical Informatics Assoc..

[4]  Francis S. Collins,et al.  PCORnet: turning a dream into reality , 2014, J. Am. Medical Informatics Assoc..

[5]  S. Shih,et al.  Health information technology and the primary care information project. , 2014, American journal of public health.

[6]  Farzad Mostashari,et al.  Adoption of electronic health records grows rapidly, but fewer than half of US hospitals had at least a basic system in 2012. , 2013, Health affairs.

[7]  Julia Adler-Milstein,et al.  Operational health information exchanges show substantial growth, but long-term funding remains a concern. , 2013, Health affairs.

[8]  Marc B Rosenman,et al.  A regional informatics platform for coordinated antibiotic-resistant infection tracking, alerting, and prevention. , 2013, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[9]  Christopher G. Chute,et al.  The absence of longitudinal data limits the accuracy of high-throughput clinical phenotyping for identifying type 2 diabetes mellitus subjects , 2013, Int. J. Medical Informatics.

[10]  J. Valderas,et al.  Comparison of the information provided by electronic health records data and a population health survey to estimate prevalence of selected health conditions and multimorbidity , 2013, BMC Public Health.

[11]  Murat Kantarcioglu,et al.  A practical approach to achieve private medical record linkage in light of public resources , 2013, J. Am. Medical Informatics Assoc..

[12]  Erin Holve,et al.  The Electronic Data Methods (EDM) Forum for Comparative Effectiveness Research (CER) , 2012, Medical care.

[13]  Keith Marsolo,et al.  An i2b2-based, generalizable, open source, self-scaling chronic disease registry , 2012, J. Am. Medical Informatics Assoc..

[14]  Amar K. Das,et al.  A simple heuristic for blindfolded record linkage , 2012, J. Am. Medical Informatics Assoc..

[15]  William R. Buckingham,et al.  Estimating Wisconsin Asthma Prevalence Using Clinical Electronic Health Records And Public Health Data , 2012, ATS 2012.

[16]  Jihoon Kim,et al.  iDASH: integrating data for analysis, anonymization, and sharing , 2012, J. Am. Medical Informatics Assoc..

[17]  Marsha A Raebel,et al.  Design considerations, architecture, and use of the Mini‐Sentinel distributed data system , 2012, Pharmacoepidemiology and drug safety.

[18]  C. Chute,et al.  Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium , 2011, Science Translational Medicine.

[19]  R. Platt,et al.  Developing the Sentinel System--a national resource for evidence development. , 2011, The New England journal of medicine.

[20]  D. Blumenthal,et al.  The "meaningful use" regulation for electronic health records. , 2010, The New England journal of medicine.

[21]  Jeffrey J. VanWormer,et al.  Methods of Using Electronic Health Records for Population-Level Surveillance of Coronary Heart Disease Risk in the Heart of New Ulm Project , 2010, Diabetes Spectrum.

[22]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[23]  E. Clayton,et al.  Principles of Human Subjects Protections Applied in an Opt‐Out, De‐identified Biobank , 2010, Clinical and translational science.

[24]  Michael D. Greenberg,et al.  Identity Crisis: An Examination of the Costs and Benefits of a Unique Patient Identifier for the U.S. Health Care System , 2008 .

[25]  Clement J. McDonald,et al.  Research Paper: Use of a Regional Health Information Exchange to Detect Crossover of Patients with MRSA between Urban Hospitals , 2008, J. Am. Medical Informatics Assoc..

[26]  Anders Sundström,et al.  The new Swedish Prescribed Drug Register—Opportunities for pharmacoepidemiological research and experience from the first six months , 2007, Pharmacoepidemiology and drug safety.

[27]  Lonnie Blevins,et al.  The Indiana network for patient care: a working local health information infrastructure. An example of a working infrastructure collaboration that links data from five health systems and hundreds of millions of entries. , 2005, Health affairs.

[28]  Michael Weiner,et al.  A practical method of linking data from Medicare claims and a comprehensive electronic medical records system , 2003, Int. J. Medical Informatics.

[29]  L. Goldfrank,et al.  The ecology of medical care revisited. , 2001, The New England journal of medicine.

[30]  Catherine Quantin,et al.  How to ensure data security of an epidemiological follow-up: quality assessment of an anonymous record linkage procedure , 1998, Int. J. Medical Informatics.

[31]  D. Boyle,et al.  The diabetes audit and research in Tayside Scotland (darts) study: electronic record linkage to create a diabetes register , 1997, BMJ.

[32]  Daniel J. Vreeman,et al.  An Evaluation of the Rates of Repeat Notifiable Disease Reporting and Patient Crossover Using a Health Information Exchange-based Automated Electronic Laboratory Reporting System , 2012, AMIA.

[33]  I. Kohane,et al.  Application of Information Technology: The Shared Health Research Information Network (SHRINE): A Prototype Federated Query Tool for Clinical Data Repositories , 2009, J. Am. Medical Informatics Assoc..

[34]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[35]  T. Churches,et al.  Bmc Medical Informatics and Decision Making Some Methods for Blindfolded Record Linkage , 2004 .

[36]  J. Marc Overhage,et al.  Analysis of a Probabilistic Record Linkage Technique without Human Review , 2003, AMIA.

[37]  J. Marc Overhage,et al.  Community Clinical Data Exchange for Emergency Medicine Patients , 2003, AMIA.

[38]  S. Huff,et al.  Building a Comprehensive Clinical Information System from Components , 2003, Methods of Information in Medicine.

[39]  J. Marc Overhage,et al.  Analysis of identifier performance using a deterministic linkage algorithm , 2002, AMIA.

[40]  M G Arellano,et al.  Issues in identification and linkage of patient records across an integrated delivery system. , 1998, Journal of healthcare information management : JHIM.

[41]  Rainer Schnell,et al.  Bmc Medical Informatics and Decision Making Privacy-preserving Record Linkage Using Bloom Filters , 2022 .