Privacy-preserving record linkage in large databases using secure multiparty computation

BackgroundPractical applications for data analysis may require combining multiple databases belonging to different owners, such as health centers. The analysis should be performed without violating privacy of neither the centers themselves, nor the patients whose records these centers store. To avoid biased analysis results, it may be important to remove duplicate records among the centers, so that each patient’s data would be taken into account only once. This task is very closely related to privacy-preserving record linkage.MethodsThis paper presents a solution to privacy-preserving deduplication among records of several databases using secure multiparty computation. It is build upon one of the fastest practical secure multiparty computation platforms, called Sharemind.ResultsThe tests on ca 10 million records of simulated databases with 1000 health centers of 10000 records each show that the computation is feasible in practice. The expected running time of the experiment is ca. 30 min for computing servers connected over 100 Mbit/s WAN, the expected error of the results is 2−40, and no errors have been detected for the particular test set that we used for our benchmarks.ConclusionsThe solution is ready for practical use. It has well-defined security properties, implied by the properties of Sharemind platform. The solution assumes that exact matching of records is required, and a possible future research would be extending it to approximate matching.

[1]  Dan Bogdanov,et al.  From Input Private to Universally Composable Secure Multi-party Computation Primitives , 2014, 2014 IEEE 27th Computer Security Foundations Symposium.

[2]  Liina Kamm,et al.  Privacy-preserving statistical analysis using secure multi-party computation , 2015 .

[3]  Jan Willemson,et al.  Hybrid Model of Fixed and Floating Point Numbers in Secure Multiparty Computations , 2014, ISC.

[4]  Jan Willemson,et al.  Point-Counting Method for Embarrassingly Parallel Evaluation in Secure Computation , 2015, FPS.

[5]  Elisabeth Oswald,et al.  Characterisation and Estimation of the Key Rank Distribution in the Context of Side Channel Evaluations , 2016, IACR Cryptol. ePrint Arch..

[6]  Murat Kantarcioglu,et al.  A practical approach to achieve private medical record linkage in light of public resources , 2013, J. Am. Medical Informatics Assoc..

[7]  C. Fidge,et al.  Privacy-preserving electronic health record linkage using pseudonym identifiers , 2008, HealthCom 2008 - 10th International Conference on e-health Networking, Applications and Services.

[8]  Vassilios S. Verykios,et al.  A fast and efficient Hamming LSH-based scheme for accurate linkage , 2016, Knowledge and Information Systems.

[9]  Dan Bogdanov,et al.  High-performance secure multi-party computation for data mining applications , 2012, International Journal of Information Security.

[10]  Riivo Talviste,et al.  From Oblivious AES to Efficient and Secure Database Join in the Multiparty Setting , 2013, ACNS.

[11]  Moni Naor,et al.  Universal one-way hash functions and their cryptographic applications , 1989, STOC '89.

[12]  Qi Wang,et al.  Random-data perturbation techniques and privacy-preserving data mining , 2005, Knowledge and Information Systems.

[13]  Peeter Laud,et al.  A Private Lookup Protocol with Low Online Complexity for Secure Multiparty Computation , 2014, ICICS.

[14]  Rainer Schnell,et al.  Cryptanalysis of Basic Bloom Filters Used for Privacy Preserving Record Linkage , 2014, J. Priv. Confidentiality.

[15]  Ivan Damgård,et al.  Multiparty Computation from Somewhat Homomorphic Encryption , 2012, IACR Cryptol. ePrint Arch..

[16]  Jan Willemson,et al.  Secure floating point arithmetic and private satellite collision analysis , 2015, International Journal of Information Security.

[17]  Wenliang Du,et al.  Protocols for Secure Remote Database Access with Approximate Matching , 2001, E-Commerce Security and Privacy.

[18]  Marcel Keller,et al.  MASCOT: Faster Malicious Arithmetic Secure Computation with Oblivious Transfer , 2016, IACR Cryptol. ePrint Arch..

[19]  Rainer Schnell,et al.  Bmc Medical Informatics and Decision Making Privacy-preserving Record Linkage Using Bloom Filters , 2022 .

[20]  Peter Christen,et al.  Privacy-preserving matching of similar patients , 2016, J. Biomed. Informatics.

[21]  Bradley Malin,et al.  Design and implementation of a privacy preserving electronic health record linkage tool in Chicago , 2015, J. Am. Medical Informatics Assoc..

[22]  Martin R. Albrecht,et al.  MiMC: Efficient Encryption and Cryptographic Hashing with Minimal Multiplicative Complexity , 2016, ASIACRYPT.

[23]  Günter Schreier,et al.  Piloting the European Unified Patient Identity Management (EUPID) Concept to Facilitate Secondary Use of Neuroblastoma Data from Clinical Trials and Biobanking , 2016, eHealth.

[24]  Dan Bogdanov,et al.  Students and Taxes: a Privacy-Preserving Study Using Secure Computation , 2016, Proc. Priv. Enhancing Technol..

[25]  Yehuda Lindell,et al.  High-Throughput Semi-Honest Secure Three-Party Computation with an Honest Majority , 2016, IACR Cryptol. ePrint Arch..

[26]  Erhard Rahm,et al.  Privacy-Preserving Record Linkage for Big Data: Current Approaches and Research Challenges , 2017, Handbook of Big Data Technologies.

[27]  Jan Willemson,et al.  Round-Efficient Oblivious Database Manipulation , 2011, ISC.

[28]  Jaak Randmets Programming Languages for Secure Multi-party Computation Application Development , 2017 .

[29]  Jan Willemson,et al.  Alternative Implementations of Secure Real Numbers , 2016, IACR Cryptol. ePrint Arch..

[30]  Peeter Laud,et al.  A Domain-Specific Language for Low-Level Secure Multiparty Computation Protocols , 2015, CCS.

[31]  Dan Bogdanov,et al.  A Practical Analysis of Oblivious Sorting Algorithms for Secure Multi-party Computation , 2014, NordSec.