Automated Cryptanalysis of Bloom Filter Encryptions of Health Records

Privacy-preserving record linkage with Bloom filters has become increasingly popular in medical applications, since Bloom filters allow for probabilistic linkage of sensitive personal data. However, since evidence indicates that Bloom filters lack sufficiently high security where strong security guarantees are required, several suggestions for their improvement have been made in literature. One of those improvements proposes the storage of several identifiers in one single Bloom filter. In this paper we present an automated cryptanalysis of this Bloom filter variant. The three steps of this procedure constitute our main contributions: (1) a new method for the detection of Bloom filter encrytions of bigrams (so-called atoms), (2) the use of an optimization algorithm for the assignment of atoms to bigrams, (3) the reconstruction of the original attribute values by linkage against bigram sets obtained from lists of frequent attribute values in the underlying population. To sum up, our attack provides the first convincing attack on Bloom filter encryptions of records built from more than one identifier.

[1]  Thomas B. Newman,et al.  Implementation Brief: Use of Commercial Record Linkage Software and Vital Statistics to Identify Patient Deaths , 1997, J. Am. Medical Informatics Assoc..

[2]  William E. Winkler,et al.  Data quality and record linkage techniques , 2007 .

[3]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[4]  Rainer Schnell,et al.  Cryptanalysis of Basic Bloom Filters Used for Privacy Preserving Record Linkage , 2014, J. Priv. Confidentiality.

[5]  M. Strippoli,et al.  Cohort profile: the Swiss childhood cancer survivor study. , 2012, International journal of epidemiology.

[6]  Rainer Schnell,et al.  Bmc Medical Informatics and Decision Making Privacy-preserving Record Linkage Using Bloom Filters , 2022 .

[7]  P A Van den Brandt,et al.  Development of a record linkage protocol for use in the Dutch Cancer Registry for Epidemiological Research. , 1990, International journal of epidemiology.

[8]  Michael Mitzenmacher,et al.  Less Hashing, Same Performance: Building a Better Bloom Filter , 2006, ESA.

[9]  Thomas P. Jakobsen,et al.  A Fast Method for the Cryptanalysis of Substitution Ciphers , 1995 .

[10]  Rainer Schnell,et al.  A Novel Error-Tolerant Anonymous Linking Code , 2011 .

[11]  D. Pennell,et al.  Cardiovascular magnetic resonance of left ventricular pseudoaneurysm , 2005, Heart.

[12]  Sean M. Randall,et al.  Privacy-preserving record linkage on large real world datasets , 2014, J. Biomed. Informatics.

[13]  Murat Kantarcioglu,et al.  A Constraint Satisfaction Cryptanalysis of Bloom Filters in Private Record Linkage , 2011, PETS.

[14]  Margarida Cristiana Napoleão Rocha Vigilância dos óbitos registrados com causa básica hanseníase : caracterização no Brasil (2004-2009) e investigação em Fortaleza, Ceará (2006-2011) , 2013 .

[15]  Sean M. Randall,et al.  The effect of data cleaning on record linkage quality , 2013, BMC Medical Informatics and Decision Making.

[16]  Murat Kantarcioglu,et al.  A practical approach to achieve private medical record linkage in light of public resources , 2013, J. Am. Medical Informatics Assoc..