Privacy-preserving matching of similar patients

The identification of similar entities represented by records in different databases has drawn considerable attention in many application areas, including in the health domain. One important type of entity matching application that is vital for quality healthcare analytics is the identification of similar patients, known as similar patient matching. A key component of identifying similar records is the calculation of similarity of the values in attributes (fields) between these records. Due to increasing privacy and confidentiality concerns, using the actual attribute values of patient records to identify similar records across different organizations is becoming non-trivial because the attributes in such records often contain highly sensitive information such as personal and medical details of patients. Therefore, the matching needs to be based on masked (encoded) values while being effective and efficient to allow matching of large databases. Bloom filter encoding has widely been used as an efficient masking technique for privacy-preserving matching of string and categorical values. However, no work on Bloom filter-based masking of numerical data, such as integer (e.g. age), floating point (e.g. body mass index), and modulus (numbers wrap around upon reaching a certain value, e.g. date and time), which are commonly required in the health domain, has been presented in the literature. We propose a framework with novel methods for masking numerical data using Bloom filters, thereby facilitating the calculation of similarities between records. We conduct an empirical study on publicly available real-world datasets which shows that our framework provides efficient masking and achieves similar matching accuracy compared to the matching of actual unencoded patient records.

[1]  J. Marc Overhage,et al.  Real World Performance of Approximate String Comparators for use in Patient Matching , 2004, MedInfo.

[2]  Siu-Ming Yiu,et al.  An Efficient Bloom Filter Based Solution for Multiparty Private Matching , 2006, Security and Management.

[3]  Murat Kantarcioglu,et al.  A Constraint Satisfaction Cryptanalysis of Bloom Filters in Private Record Linkage , 2011, PETS.

[4]  Peter Christen,et al.  Scalable Privacy-Preserving Record Linkage for Multiple Databases , 2014, CIKM.

[5]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[6]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[7]  Peter Christen,et al.  An Iterative Two-Party Protocol for Scalable Privacy-Preserving Record Linkage , 2012, AusDM.

[8]  Kyungtae Kang,et al.  Similar patient search using the results of heartbeat classification , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[9]  Peter Christen,et al.  Data Matching , 2012, Data-Centric Systems and Applications.

[10]  M. Burkhart,et al.  Fast Private Set Operations with SEPIA , 2012 .

[11]  Nayha Sethi,et al.  Delivering proportionate governance in the era of eHealth , 2013, Medical law international.

[12]  Eli Upfal,et al.  Probability and Computing: Randomized Algorithms and Probabilistic Analysis , 2005 .

[13]  Peter Christen,et al.  An Evaluation Framework for Privacy-Preserving Record Linkage , 2014, J. Priv. Confidentiality.

[14]  Fei Wang,et al.  Adaptive semi-supervised recursive tree partitioning: The ART towards large scale patient indexing in personalized healthcare , 2015, J. Biomed. Informatics.

[15]  Justin R. Boyle,et al.  Impact of Admission and Discharge Peak Times on Hospital Overcrowding , 2011, HIC.

[16]  Jimeng Sun,et al.  Integrating Distance Metrics Learned from Multiple Experts and its Application in Inter-Patient Similarity Assessment , 2011, SDM.

[17]  Jimeng Sun,et al.  Localized Supervised Metric Learning on Temporal Physiological Data , 2010, 2010 20th International Conference on Pattern Recognition.

[18]  Darcy A. Davis,et al.  Bringing Big Data to Personalized Healthcare: A Patient-Centered Framework , 2013, Journal of General Internal Medicine.

[19]  Rainer Schnell,et al.  Bmc Medical Informatics and Decision Making Privacy-preserving Record Linkage Using Bloom Filters , 2022 .

[20]  Nuno Salgado,et al.  Intelligent Data Analysis of Clinical Trials , 2005, AMIA.

[21]  Rainer Schnell,et al.  A Novel Error-Tolerant Anonymous Linking Code , 2011 .

[22]  Taghi M. Khoshgoftaar,et al.  A review of data mining using big data in health informatics , 2013, Journal Of Big Data.

[23]  Jia Rong,et al.  Sharing sensitive medical data sets for research purposes - A case study , 2014, 2014 International Conference on Data Science and Advanced Analytics (DSAA).

[24]  L L Roos,et al.  A research registry: uses, development, and accuracy. , 1999, Journal of clinical epidemiology.

[25]  Kayvan Najarian,et al.  Unified wavelet and gaussian filtering for segmentation of CT images; application in segmentation of bone in pelvic CT images , 2009, BMC Medical Informatics Decis. Mak..

[26]  Jeffrey Braithwaite,et al.  A four-year, systems-wide intervention promoting interprofessional collaboration , 2012, BMC Health Services Research.

[27]  C. Jorm,et al.  Setting economic priorities for patient safety programs and patient safety research using case mix costing data , 2009, BMC Health Services Research.

[28]  R. Lyons,et al.  The SAIL Databank: building a national architecture for e-health research and evaluation , 2009, BMC health services research.

[29]  J. H. Zar,et al.  Significance Testing of the Spearman Rank Correlation Coefficient , 1972 .

[30]  Sean M. Randall,et al.  Privacy-preserving record linkage on large real world datasets , 2014, J. Biomed. Informatics.

[31]  Mohammed Saeed,et al.  A Novel Method for the Efficient Retrieval of Similar Multiparameter Physiologic Time Series Using Wavelet-Based Symbolic Representations , 2006, AMIA.

[32]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[33]  Peter Christen,et al.  A taxonomy of privacy-preserving record linkage techniques , 2013, Inf. Syst..

[34]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[35]  Sean M. Randall,et al.  Privacy preserving record linkage using homomorphic encryption , 2015 .

[36]  Sean M. Randall,et al.  Data linkage infrastructure for cross-jurisdictional health-related research in Australia , 2012, BMC Health Services Research.