Linking Health Records for Federated Query Processing

Abstract A federated query portal in an electronic health record infrastructure enables large epidemiology studies by combining data from geographically dispersed medical institutions. However, an individual’s health record has been found to be distributed across multiple carrier databases in local settings. Privacy regulations may prohibit a data source from revealing clear text identifiers, thereby making it non-trivial for a query aggregator to determine which records correspond to the same underlying individual. In this paper, we explore this problem of privately detecting and tracking the health records of an individual in a distributed infrastructure. We begin with a secure set intersection protocol based on commutative encryption, and show how to make it practical on comparison spaces as large as 1010 pairs. Using bigram matching, precomputed tables, and data parallelism, we successfully reduced the execution time to a matter of minutes, while retaining a high degree of accuracy even in records with data entry errors. We also propose techniques to prevent the inference of identifier information when knowledge of underlying data distributions is known to an adversary. Finally, we discuss how records can be tracked utilizing the detection results during query processing.

[1]  Elisa Bertino,et al.  Private record matching using differential privacy , 2010, EDBT '10.

[2]  Edoardo M. Airoldi,et al.  Confidentiality Preserving Audits of Electronic Medical Record Access , 2007, MedInfo.

[3]  Lisa M. Schilling,et al.  Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) Technology Infrastructure for a Distributed Data Network , 2013, EGEMS.

[4]  Murat Kantarcioglu,et al.  Private medical record linkage with approximate matching. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[5]  Peter Christen,et al.  Some methods for blindfolded record linkage , 2004, BMC Medical Informatics Decis. Mak..

[6]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[7]  Peter Christen,et al.  A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication , 2012, IEEE Transactions on Knowledge and Data Engineering.

[8]  Peter Christen,et al.  Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface , 2008, KDD.

[9]  Yan Huang,et al.  Efficient Genome-Wide, Privacy-Preserving Similar Patient Query based on Private Edit Distance , 2015, CCS.

[10]  Vassilios S. Verykios,et al.  Privacy Preserving Record Linkage Using Phonetic Codes , 2009, 2009 Fourth Balkan Conference in Informatics.

[11]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[12]  Elisa Bertino,et al.  A Hybrid Approach to Private Record Linkage , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  J. Marc Overhage,et al.  Analysis of identifier performance using a deterministic linkage algorithm , 2002, AMIA.

[14]  Rainer Schnell,et al.  Bmc Medical Informatics and Decision Making Privacy-preserving Record Linkage Using Bloom Filters , 2022 .

[15]  Pinar Keskinocak,et al.  A framework for assessing patient crossover and health information exchange value , 2011, J. Am. Medical Informatics Assoc..

[16]  Murat Kantarcioglu,et al.  A practical approach to achieve private medical record linkage in light of public resources , 2013, J. Am. Medical Informatics Assoc..

[17]  Martin E. Hellman,et al.  An improved algorithm for computing logarithms over GF(p) and its cryptographic significance (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[18]  Rainer Schnell,et al.  Cryptanalysis of Basic Bloom Filters Used for Privacy Preserving Record Linkage , 2014, J. Priv. Confidentiality.

[19]  Anne Marie Meyer,et al.  Linking Data for Health Services Research: A Framework and Instructional Guide , 2014 .

[20]  J. Marc Overhage,et al.  Real World Performance of Approximate String Comparators for use in Patient Matching , 2004, MedInfo.

[21]  Murat Kantarcioglu,et al.  A Constraint Satisfaction Cryptanalysis of Bloom Filters in Private Record Linkage , 2011, PETS.

[22]  Basit Shafiq,et al.  Privacy Preserving Integration of Health Care Data , 2010, Int. J. Comput. Model. Algorithms Medicine.

[23]  Divesh Srivastava,et al.  Incremental Record Linkage , 2014, Proc. VLDB Endow..

[24]  H Brenner,et al.  Application of Capture-Recapture Methods for Disease Monitoring: Potential Effects of Imperfect Record Linkage , 1994, Methods of Information in Medicine.

[25]  Scott L. DuVall,et al.  Extending the Fellegi-Sunter probabilistic record linkage method for approximate field comparators , 2010, J. Biomed. Informatics.

[26]  Griffin M Weber,et al.  Federated queries of clinical data repositories: the sum of the parts does not equal the whole. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[27]  Noha Adly Efficient Record Linkage using a Double Embedding Scheme , 2009, DMIN.

[28]  J. Marc Overhage,et al.  Analysis of a Probabilistic Record Linkage Technique without Human Review , 2003, AMIA.

[29]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[30]  Elisa Bertino,et al.  Privacy preserving schema and data matching , 2007, SIGMOD '07.

[31]  Peter Christen,et al.  Probabilistic Data Generation for Deduplication and Data Linkage , 2005, IDEAL.

[32]  J. Marc Overhage,et al.  In Support of Emergency Department Health Information Technology , 2005, AMIA.

[33]  Vassilios S. Verykios,et al.  Secure Blocking + Secure Matching = Secure Record Linkage , 2011, J. Comput. Sci. Eng..

[34]  Murat Kantarcioglu,et al.  Composite Bloom Filters for Secure Record Linkage , 2014, IEEE Transactions on Knowledge and Data Engineering.

[35]  Mikhail J. Atallah,et al.  Efficient Private Record Linkage , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[36]  Kurt Schmidlin,et al.  Privacy Preserving Probabilistic Record Linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality , 2015, BMC Medical Research Methodology.

[37]  L Dusserre,et al.  A one way public key cryptosystem for the linkage of nominal files in epidemiological studies. , 1995, Medinfo. MEDINFO.

[38]  Sean M. Randall,et al.  Privacy-preserving record linkage on large real world datasets , 2014, J. Biomed. Informatics.

[39]  P. Ravikumar and W. W. Cohen and S. E. Fienberg,et al.  A Secure Protocol for Computing String Distance Metrics , 2004 .

[40]  Alexandre V. Evfimievski,et al.  Information sharing across private databases , 2003, SIGMOD '03.

[41]  Peter Christen,et al.  Blind Data Linkage Using n-gram Similarity Comparisons , 2004, PAKDD.

[42]  Nora Cuppens-Boulahia,et al.  Privacy Preserving Record Matching Using Automated Semi-trusted Broker , 2015, DBSec.

[43]  Benny Pinkas,et al.  Faster Private Set Intersection Based on OT Extension , 2014, USENIX Security Symposium.