A Sorted Neighborhood Approach to Multidimensional Privacy Preserving Blocking

Privacy Preserving Record Linkage is an emerging field of research which aims to integrate data from heterogeneous data sources while respecting privacy. It is evident that this task exhibits high computational complexity, therefore Privacy Preserving Blocking has been introduced in order to improve performance by eliminating unrelated candidate pairs. In this paper we present a solution to this problem by introducing the Sorted Neighborhood for Encrypted Fields algorithm and combining it with a secure multidimensional privacy preserving blocking method. Our approach is applicable to all types of data fields and manages to significantly boost the Privacy Preserving Record Linkage process without sacrificing matching accuracy. We analytically prove that our method is secure and we also provide empirical evidence where the high performance of our method is established by comparing it to other established methods.

[1]  Salvatore J. Stolfo,et al.  Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.

[2]  Elisa Bertino,et al.  Privacy preserving schema and data matching , 2007, SIGMOD '07.

[3]  Dawn Xiaodong Song,et al.  Privacy-Preserving Set Operations , 2005, CRYPTO.

[4]  Elisa Bertino,et al.  A Hybrid Approach to Private Record Linkage , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Rainer Schnell,et al.  Bmc Medical Informatics and Decision Making Privacy-preserving Record Linkage Using Bloom Filters , 2022 .

[6]  Murat Kantarcioglu,et al.  A Constraint Satisfaction Cryptanalysis of Bloom Filters in Private Record Linkage , 2011, PETS.

[7]  Vassilios S. Verykios,et al.  Privacy Preserving Record Linkage Using Phonetic Codes , 2009, 2009 Fourth Balkan Conference in Informatics.

[8]  Chris Clifton,et al.  Privacy-preserving data integration and sharing , 2004, DMKD '04.

[9]  Elisa Bertino,et al.  Private record matching using differential privacy , 2010, EDBT '10.

[10]  Vassilios S. Verykios,et al.  Privacy preserving record linkage approaches , 2009, Int. J. Data Min. Model. Manag..

[11]  Stanley Trepetin Privacy-Preserving String Comparisons in Record Linkage Systems: A Review , 2008, Inf. Secur. J. A Glob. Perspect..

[12]  Wenliang Du,et al.  Secure and private sequence comparisons , 2003, WPES '03.

[13]  Peter Christen,et al.  An Efficient Two-Party Protocol for Approximate Matching in Private Record Linkage , 2011, AusDM.

[14]  Vassilios S. Verykios,et al.  Secure Blocking + Secure Matching = Secure Record Linkage , 2011, J. Comput. Sci. Eng..

[15]  Vassilios S. Verykios,et al.  A Highly Efficient and Secure Multidimensional Blocking Approach for Private Record Linkage , 2012, 2012 IEEE 24th International Conference on Tools with Artificial Intelligence.

[16]  Vassilios S. Verykios,et al.  Reference table based k-anonymous private blocking , 2012, SAC '12.

[17]  Dongwon Lee,et al.  Blocking-aware private record linkage , 2005, IQIS '05.

[18]  Rob Hall,et al.  Privacy-Preserving Record Linkage , 2010, Privacy in Statistical Databases.

[19]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[20]  Lifang Gu,et al.  Privacy-Preserving Fuzzy Matching Using a Public Reference Table , 2009 .

[21]  Peter Christen Development and user experiences of an open source data cleaning, deduplication and record linkage system , 2009, SKDD.

[22]  Peter Christen,et al.  Fake Injection Strategies for Private Phonetic Matching , 2011, DPM/SETOP.

[23]  Peter Christen,et al.  Data Matching , 2012, Data-Centric Systems and Applications.