Private record linkage with Bloom filters

In many record linkage applications, identifiers have to be encrypted to preserve privacy. Therefore, a method for approximate string comparison in private record linkage is needed. We describe a new method of approximate string comparison in private record linkage. The main idea is to store q-grams sets derived from identifier values in Bloom filters and compare them bitwise across databases. This exploits the cryptographic features of Bloom filters while nevertheless allowing the calculation of string similarities. We show that the proposed method compares quite well to evaluating string comparison functions with plain text values of identifiers.

[1]  Peter Christen,et al.  Some methods for blindfolded record linkage , 2004, BMC Medical Informatics Decis. Mak..

[2]  Elisa Bertino,et al.  Privacy preserving schema and data matching , 2007, SIGMOD '07.

[3]  R. Schnell,et al.  An Empirical Comparison of Approaches to Approximate String Matching in Private Record Linkage , 2010 .

[4]  Chaoyi Pang,et al.  Improved Record Linkage for Encrypted Identifying Data , 2006 .

[5]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[6]  Rainer Schnell,et al.  Bmc Medical Informatics and Decision Making Privacy-preserving Record Linkage Using Bloom Filters , 2022 .

[7]  H. Gabriela,et al.  Cluster-preserving Embedding of Proteins , 1999 .

[8]  Vassilios S. Verykios,et al.  Privacy preserving record linkage approaches , 2009, Int. J. Data Min. Model. Manag..

[9]  Murat Kantarcioglu,et al.  Private medical record linkage with approximate matching. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[10]  Rafael Camps,et al.  Improving the Efficacy of Approximate Searching by Personal-Name , 2003, NLDB.

[11]  L Dusserre,et al.  Extraction and anonymity protocol of medical file. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[12]  Stanley Trepetin Privacy-Preserving String Comparisons in Record Linkage Systems: A Review , 2008, Inf. Secur. J. A Glob. Perspect..

[13]  William E. Winkler,et al.  Data quality and record linkage techniques , 2007 .

[14]  Michael Dahlin,et al.  Using Bloom Filters to Refine Web Search Results , 2005, WebDB.