A Comparison of Statistical Linkage Keys with Bloom Filter-based Encryptions for Privacy-preserving Record Linkage using Real-world Mammography Data

New EU regulations on the need to encrypt personal identifiers for linking data will increase the importance of Privacy-Preserving Record Linkage (PPRL) techniques over the course of the next years. Currently, the use of Anonymous Linkage Codes (ALCs) is the standard procedure for PPRL of medical databases. Recently, Bloom filter-based encodings of pseudo-identifiers such as names have received increasing attention for PPRL tasks. In contrast to most previous research in PPRL, which is based on simulated data, we compare the performance of ALCs and Bloom filter-based linkage keys using real data from a large regional breast cancer screening program. This large regional mammography data base contains nearly 200.000 records. We compare precision and recall for linking the data set existing at point t0 with new incident cases occuring after t0 using different encoding and matching strategies for the personal identifiers. Enhancing ALCs with an additional identifier (place of birth) yields better recall than standard ALCs. Using the same information for Bloom filters with recommended parameter settings exceeds ALCs in recall, while preserving precision.

[1]  Rainer Schnell,et al.  An efficient privacy-preserving record linkage technique for administrative data and censuses , 2014 .

[2]  Kurt Schmidlin,et al.  Privacy Preserving Probabilistic Record Linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality , 2015, BMC Medical Research Methodology.

[3]  Martin Kroll,et al.  Who Is 1011011111 \ldots 1110110010? Automated Cryptanalysis of Bloom Filter Encryptions of Databases with Several Personal Identifiers , 2015, BIOSTEC.

[4]  E. Kushilevitz Foundations of Cryptography Foundations of Cryptography , 2014 .

[5]  S. Duckett,et al.  Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study , 2010, BMC health services research.

[6]  H Raspe,et al.  Beyond mammography screening: quality assurance in breast cancer diagnosis (The QuaMaDi Project) , 2006, British Journal of Cancer.

[7]  Catherine Quantin,et al.  The Swiss Solution for Anonymously Chaining Patient Files , 2001, MedInfo.

[8]  T. Welte,et al.  Influenza vaccination is associated with reduced severity of community-acquired pneumonia , 2010, European Respiratory Journal.

[9]  William Stallings,et al.  Cryptography and Network Security: Principles and Practice , 1998 .

[10]  Sean M. Randall,et al.  Privacy-preserving record linkage on large real world datasets , 2014, J. Biomed. Informatics.

[11]  G. Ridder,et al.  The Econometrics of Data Combination , 2007 .

[12]  P. Halfon,et al.  Ambulatory healthcare information system: a conceptual framework. , 2006, Health policy.

[13]  Michael Mitzenmacher,et al.  Less Hashing, Same Performance: Building a Better Bloom Filter , 2006, ESA.

[14]  Christian Borgs,et al.  High quality linkage using Multibit Trees for privacy-preserving blocking , 2017, International Journal of Population Data Science.

[15]  Alex Delis,et al.  A Tutorial on Blocking Methods for Privacy-Preserving Record Linkage , 2015, ALGOCLOUD.

[16]  Jan Camenisch,et al.  Further Privacy Mechanisms , 2011, Digital Privacy - PRIME.

[17]  Hugo Krawczyk,et al.  HMAC: Keyed-Hashing for Message Authentication , 1997, RFC.

[18]  William E. Winkler,et al.  Data quality and record linkage techniques , 2007 .

[19]  Sarah Castro,et al.  Administrative Record Linkage as a Tool for Public Health Research , 2014 .

[20]  M. Brownell,et al.  Administrative record linkage as a tool for public health research. , 2011, Annual review of public health.

[21]  R. Karmel,et al.  Event-based record linkage in health and aged care services data: a methodological innovation , 2007, BMC Health Services Research.

[22]  Rainer Schnell,et al.  Privacy-preserving Record Linkage , 2015 .

[23]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[24]  Stuart M Speedie,et al.  Linking patients' records across organizations while maintaining anonymity. , 2007, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[25]  Vassilios S. Verykios,et al.  Scalable Blocking for Privacy Preserving Record Linkage , 2015, KDD.

[26]  Christian Borgs,et al.  Randomized Response and Balanced Bloom Filters for Privacy Preserving Record Linkage , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[27]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[28]  Rainer Schnell,et al.  A Novel Error-Tolerant Anonymous Linking Code , 2011 .

[29]  William E. Winkler,et al.  Record linkage , 2010 .

[30]  A. J. Bass,et al.  Research use of linked health data — a best practice protocol , 2002, Australian and New Zealand journal of public health.

[31]  Christian Borgs,et al.  Building a National Perinatal Data Base without the Use of Unique Personal Identifiers , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[32]  Clark C. Evans,et al.  Using global unique identifiers to link autism collections , 2010, J. Am. Medical Informatics Assoc..

[33]  Rainer Schnell,et al.  Bmc Medical Informatics and Decision Making Privacy-preserving Record Linkage Using Bloom Filters , 2022 .

[34]  Peter Christen,et al.  Privacy-preserving matching of similar patients , 2016, J. Biomed. Informatics.

[35]  Murat Kantarcioglu,et al.  A practical approach to achieve private medical record linkage in light of public resources , 2013, J. Am. Medical Informatics Assoc..

[36]  Murat Kantarcioglu,et al.  A Constraint Satisfaction Cryptanalysis of Bloom Filters in Private Record Linkage , 2011, PETS.

[37]  Josep Domingo-Ferrer,et al.  New directions in anonymization: Permutation paradigm, verifiability by subjects and intruders, transparency to users , 2015, Inf. Sci..

[38]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[39]  James H Boyd,et al.  Limited privacy protection and poor sensitivity , 2016, Health information management : journal of the Health Information Management Association of Australia.

[40]  Rainer Schnell,et al.  Cryptanalysis of Basic Bloom Filters Used for Privacy Preserving Record Linkage , 2014, J. Priv. Confidentiality.

[41]  Katie Irvine,et al.  Optimal strategy for linkage of datasets containing a statistical linkage key and datasets with full personal identifiers , 2014, BMC Medical Informatics and Decision Making.

[42]  Peter Willett,et al.  Applications of n-grams in textual information systems , 1998, J. Documentation.

[43]  Peter Christen,et al.  A taxonomy of privacy-preserving record linkage techniques , 2013, Inf. Syst..