Formal anonymity models for efficient privacy-preserving joins

Organizations, such as federally-funded medical research centers, must share de-identified data on their consumers to publicly accessible repositories to adhere to regulatory requirements. Many repositories are managed by third-parties and it is often unknown if records received from disparate organizations correspond to the same individual. Failure to resolve this issue can lead to biased (e.g., double counting of identical records) and underpowered (e.g., unlinked records of different data types) investigations. In this paper, we present a secure multiparty computation protocol that enables record joins via consumers' encrypted identifiers. Our solution is more practical than prior secure join models in that data holders need to interact with the third party one time per data submission. Though technically feasible, the speed of the basic protocol scales quadratically with the number of records. Thus, we introduce an extended version of our protocol in which data holders append k-anonymous features of their consumers to their encrypted submissions. These features facilitate a more efficient join computation, while providing a formal guarantee that each record is linkable to no less than k individuals in the union of all organizations' consumers. Beyond a theoretical treatment of the problem, we provide an extensive experimental investigation with data derived from the US Census to illustrate the significant gains in efficiency such an approach can achieve.

[1]  B. Lampson,et al.  Authentication in distributed systems: theory and practice , 1991, TOCS.

[2]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Peter Christen,et al.  Some methods for blindfolded record linkage , 2004, BMC Medical Informatics Decis. Mak..

[5]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[6]  Elisa Bertino,et al.  A Hybrid Approach to Private Record Linkage , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  J. Marc Overhage,et al.  Analysis of identifier performance using a deterministic linkage algorithm , 2002, AMIA.

[8]  Elske Ammenwerth,et al.  End-to-end Security in Telemedical Networks - A Practical Guideline , 2007, Int. J. Medical Informatics.

[9]  Jules J Berman Zero-check: a zero-knowledge protocol for reconciling patient identities across institutions. , 2004, Archives of pathology & laboratory medicine.

[10]  Sourav S. Bhowmick,et al.  PRIVATE-IYE: A Framework for Privacy Preserving Data Integration , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[11]  Taneli Mielikäinen,et al.  Private Itemset Support Counting , 2005, ICICS.

[12]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[13]  Murat Kantarcioglu,et al.  A Cryptographic Approach to Securely Share and Query Genomic Sequences , 2008, IEEE Transactions on Information Technology in Biomedicine.

[14]  Bradley Malin,et al.  How (not) to protect genomic data privacy in a distributed network: using trail re-identification to evaluate and design anonymity protection systems , 2004, J. Biomed. Informatics.

[15]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[16]  Terence Critchlow,et al.  Performance-Oriented Privacy-Preserving Data Integration , 2005, DILS.

[17]  Chris Clifton,et al.  Privacy-preserving data integration and sharing , 2004, DMKD '04.

[18]  J. Marc Overhage,et al.  Application of Information Technology: A Context-sensitive Approach to Anonymizing Spatial Surveillance Data: Impact on Outbreak Detection , 2006, J. Am. Medical Informatics Assoc..

[19]  Tsan-sheng Hsu,et al.  Preserving confidentiality when sharing medical database with the Cellsecu system , 2003, Int. J. Medical Informatics.

[20]  Rakesh Agrawal,et al.  Securing electronic health records without impeding the flow of information , 2007, Int. J. Medical Informatics.

[21]  Catherine Quantin,et al.  How to ensure data security of an epidemiological follow-up: quality assessment of an anonymous record linkage procedure , 1998, Int. J. Medical Informatics.

[22]  Li Xiong,et al.  HIDE: An Integrated System for Health Information DE-identification , 2008, 2008 21st IEEE International Symposium on Computer-Based Medical Systems.

[23]  Latanya Sweeney,et al.  Guaranteeing anonymity when sharing medical data, the Datafly System , 1997, AMIA.

[24]  Murat Kantarcioglu,et al.  Sovereign Joins , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[25]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[26]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[28]  Hhs Office for Civil Rights Standards for privacy of individually identifiable health information. Final rule. , 2002, Federal register.

[29]  Virginia Barbour,et al.  UK Biobank: a project in search of a protocol? , 2003, The Lancet.

[30]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[31]  Catherine Quantin,et al.  Building Application-Related Patient Identifiers: What Solution for a European Country? , 2008, International journal of telemedicine and applications.

[32]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[33]  Bradley Malin,et al.  Re-identification of Familial Database Records , 2006, AMIA.

[34]  Elisa Bertino,et al.  Privacy preserving schema and data matching , 2007, SIGMOD '07.

[35]  Dawn Xiaodong Song,et al.  Privacy-Preserving Set Operations , 2005, CRYPTO.

[36]  Zhen Lin,et al.  Using binning to maintain confidentiality of medical data , 2002, AMIA.

[37]  Bradley Malin,et al.  Determining the identifiability of DNA database entries , 2000, AMIA.

[38]  Divyakant Agrawal,et al.  Privacy Preserving Query Processing Using Third Parties , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[39]  B A Malin,et al.  Protecting Genomic Sequence Anonymity with Generalization Lattices , 2005, Methods of Information in Medicine.

[40]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[41]  Murat Kantarcioglu,et al.  A Privacy-Preserving Framework for Integrating Person-Specific Databases , 2008, Privacy in Statistical Databases.

[42]  Christos K. Georgiadis,et al.  Healthcare teams over the Internet: programming a certificate-based approach , 2003, Int. J. Medical Informatics.

[43]  Ahmed K. Elmagarmid,et al.  TAILOR: a record linkage toolbox , 2002, Proceedings 18th International Conference on Data Engineering.

[44]  S E Middleton,et al.  GEMSS: grid-infrastructure for medical service provision. , 2005, Methods of information in medicine.

[45]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[46]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[47]  Alexandre V. Evfimievski,et al.  Information sharing across private databases , 2003, SIGMOD '03.

[48]  Jacques Stern,et al.  Practical multi-candidate election system , 2001, PODC '01.

[49]  Dongwon Lee,et al.  Blocking-aware private record linkage , 2005, IQIS '05.

[50]  Oded Goldreich,et al.  Foundations of Cryptography: General Cryptographic Protocols , 2004 .

[51]  Bradley Malin,et al.  Technical Evaluation: An Evaluation of the Current State of Genomic Data Privacy Protection Technology and a Roadmap for the Future , 2004, J. Am. Medical Informatics Assoc..