A Novel Error-Tolerant Anonymous Linking Code

An anonymous linking code is an encrypted key for linking data from dierent sources. So far, quite simple algorithms for the generation of such codes based on personal characteristics as names and date of birth are in common use. These algorithms will yield many non matching codes when facing errors in the underlying indentifier values. We suggested the use of Bloom filters for calculating string similarities in a privacy-preserving manner. Here, we claim that this principle can also be used for a novel error-tolerant but still irreversible encrypted key. We call the proposed code Cryptographic Longterm Key. It consists of one single Bloom filter into which identfiers are subsequently stored. Tests on simulated databases yield linkage results comparable to non encrypted identifiers and superior to results from hitherto existing methods. Since the Cryptographic Longterm Key can be easily adapted to meet quite dierent prerequisites it might be useful for many applications.

[1]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[2]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[3]  Graham A. Stephen String Searching Algorithms , 1994, Lecture Notes Series on Computing.

[4]  Hugo Krawczyk,et al.  HMAC: Keyed-Hashing for Message Authentication , 1997, RFC.

[5]  Peter Willett,et al.  Applications of n-grams in textual information systems , 1998, J. Documentation.

[6]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[7]  Karin Haustermans,et al.  Evaluation of the encryption procedure and record linkage in the Belgian national cancer registry , 2000 .

[8]  S. Fischer-h bner IT-Security and Privacy: Design and Use of Privacy-Enhancing Security Mechanisms , 2001 .

[9]  A. J. Bass,et al.  Research use of linked health data — a best practice protocol , 2002, Australian and New Zealand journal of public health.

[10]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[11]  T. Churches A proposed architecture and method of operation for improving the protection of privacy and confidentiality in disease registers , 2003, BMC medical research methodology.

[12]  Michael Dahlin,et al.  Using Bloom Filters to Refine Web Search Results , 2005, WebDB.

[13]  Eli Upfal,et al.  Probability and computing : an introduction to randomizedalgorithms and probabilistic analysis , 2005 .

[14]  M. Wjst Anonymizing personal identifiers in genetic epidemiologic studies. , 2005, Epidemiology.

[15]  Rita Noumeir,et al.  Pseudonymization of Radiology Data for Research Purposes , 2007, Journal of Digital Imaging.

[16]  P. Halfon,et al.  Ambulatory healthcare information system: a conceptual framework. , 2006, Health policy.

[17]  Stuart M Speedie,et al.  Linking patients' records across organizations while maintaining anonymity. , 2007, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[18]  William E. Winkler,et al.  Data quality and record linkage techniques , 2007 .

[19]  Michael Mitzenmacher,et al.  Less hashing, same performance: Building a better Bloom filter , 2006, Random Struct. Algorithms.

[20]  K. Emam,et al.  An Overview of Techniques for De-Identifying Personal Health Information , 2009 .

[21]  Rainer Schnell,et al.  Bmc Medical Informatics and Decision Making Privacy-preserving Record Linkage Using Bloom Filters , 2022 .

[22]  Peter Christen,et al.  Accurate Synthetic Generation of Realistic Personal Information , 2009, PAKDD.

[23]  S. Duckett,et al.  Empirical aspects of record linkage across multiple data sets using statistical linkage keys: the experience of the PIAC cohort study , 2010, BMC health services research.

[24]  Murat Kantarcioglu,et al.  Private medical record linkage with approximate matching. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[25]  Henning Müller,et al.  Strategies for health data exchange for secondary, cross-institutional clinical research , 2010, Comput. Methods Programs Biomed..

[26]  Rob Hall,et al.  Privacy-Preserving Record Linkage , 2010, Privacy in Statistical Databases.

[27]  William E. Winkler,et al.  Record linkage , 2010 .

[28]  T. Welte,et al.  Influenza vaccination is associated with reduced severity of community-acquired pneumonia , 2010, European Respiratory Journal.

[29]  S. Ikeda,et al.  Development of a Database of Health Insurance Claims: Standardization of Disease Classifications and Anonymous Record Linkage , 2010, Journal of epidemiology.

[30]  Clark C. Evans,et al.  Using global unique identifiers to link autism collections , 2010, J. Am. Medical Informatics Assoc..

[31]  Mark Trappmann,et al.  PASS – A Household Panel Survey for Research on Unemployment and Poverty , 2010 .

[32]  Jan Camenisch,et al.  Further Privacy Mechanisms , 2011, Digital Privacy - PRIME.

[33]  Murat Kantarcioglu,et al.  A Constraint Satisfaction Cryptanalysis of Bloom Filters in Private Record Linkage , 2011, PETS.

[34]  Vassilios S. Verykios,et al.  Advances in Privacy Preserving Record Linkage , 2012 .

[35]  Murat Kantarcioglu,et al.  Quantifying the correctness, computational complexity, and security of privacy-preserving string comparators for record linkage , 2012, Inf. Fusion.

[36]  C. Shen,et al.  Linkage of patient records from disparate sources , 2013, Statistical methods in medical research.