A flexible approach to distributed data anonymization

Sensitive biomedical data is often collected from distributed sources, involving different information systems and different organizational units. Local autonomy and legal reasons lead to the need of privacy preserving integration concepts. In this article, we focus on anonymization, which plays an important role for the re-use of clinical data and for the sharing of research data. We present a flexible solution for anonymizing distributed data in the semi-honest model. Prior to the anonymization procedure, an encrypted global view of the dataset is constructed by means of a secure multi-party computing (SMC) protocol. This global representation can then be anonymized. Our approach is not limited to specific anonymization algorithms but provides pre- and postprocessing for a broad spectrum of algorithms and many privacy criteria. We present an extensive analytical and experimental evaluation and discuss which types of methods and criteria are supported. Our prototype demonstrates the approach by implementing k-anonymity, ℓ-diversity, t-closeness and δ-presence with a globally optimal de-identification method in horizontally and vertically distributed setups. The experiments show that our method provides highly competitive performance and offers a practical and flexible solution for anonymizing distributed biomedical datasets.

[1]  Mourad Ouzzani,et al.  Detecting Inconsistencies in Private Data with Secure Function Evaluation , 2011 .

[2]  Jean-Pierre Corriveau,et al.  A globally optimal k-anonymity method for the de-identification of health data. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[3]  Bruce Schneier,et al.  Applied cryptography : protocols, algorithms, and source codein C , 1996 .

[4]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[5]  David J. DeWitt,et al.  Anonymity in data publishing and distribution , 2007 .

[6]  Elizabeth S. Chen,et al.  Attribute Utility Motivated k-anonymization of datasets to support the heterogeneous needs of biomedical researchers. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[7]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[8]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[9]  Rebecca N. Wright,et al.  Privacy-preserving imputation of missing data , 2008, Data Knowl. Eng..

[10]  Sheng Zhong,et al.  Privacy-enhancing k-anonymization of customer data , 2005, PODS.

[11]  Torsten Werner Introduction To Privacy Preserving Data Publishing Concepts And Techniques , 2016 .

[12]  Benjamin C. M. Fung,et al.  Centralized and Distributed Anonymization for High-Dimensional Healthcare Data , 2010, TKDD.

[13]  Ehud Gudes,et al.  Secure distributed computation of anonymized views of shared databases , 2012, TODS.

[14]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[15]  Claudia Eckert,et al.  Flash: Efficient, Stable and Optimal K-Anonymity , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[16]  Yufei Tao,et al.  Preservation of proximity privacy in publishing numerical sensitive data , 2008, SIGMOD Conference.

[17]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[18]  Hinda R. Chaikind The Health Insurance Portability and Accountability Act (HIPAA) of 1996: Overview and Guidance on Frequently Asked Questions , 2005 .

[19]  Benjamin C. M. Fung,et al.  Privacy-preserving data mashup , 2009, EDBT '09.

[20]  Tamir Tassa,et al.  k-Anonymization Revisited , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[21]  Mihir Bellare,et al.  Deterministic and Efficiently Searchable Encryption , 2007, CRYPTO.

[22]  Benjamin C. M. Fung,et al.  Anonymity meets game theory: secure data integration with malicious participants , 2011, The VLDB Journal.

[23]  E. Clayton,et al.  Identifiability in biobanks: models, measures, and mitigation strategies , 2011, Human Genetics.

[24]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[25]  Dawn Xiaodong Song,et al.  Privacy-Preserving Set Operations , 2005, CRYPTO.

[26]  Inmaculada Arostegui,et al.  Use of generalised additive models to categorise continuous variables in clinical prediction , 2013, BMC Medical Research Methodology.

[27]  T. Suga,et al.  Weakness of provably secure searchable encryption against frequency analysis , 2012, The 5th International Conference on Communications, Computers and Applications (MIC-CCA2012).

[28]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[29]  William E. Burr,et al.  Recommendation for Key Management, Part 1: General (Revision 3) , 2006 .

[30]  Li Xiong,et al.  Distributed Anonymization: Achieving Privacy for Both Data Subjects and Data Providers , 2009, DBSec.

[31]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[32]  Eric Rescorla,et al.  The Transport Layer Security (TLS) Protocol Version 1.2 , 2008, RFC.

[33]  Bradley Malin,et al.  Rethinking the "Honest Broker" in the Changing Face of Security and Privacy , 2012, American Medical Informatics Association Annual Symposium.

[34]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[35]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[36]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[37]  王家志 Internet(互连网)简介 , 1994 .

[38]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[39]  Chris Clifton,et al.  Thoughts on k-Anonymization , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[40]  Calton Pu,et al.  A General Proximity Privacy Principle , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[41]  Bradley Malin,et al.  Technical and Policy Approaches to Balancing Patient Privacy and Data Sharing in Clinical and Translational Research , 2010, Journal of Investigative Medicine.

[42]  Tamir Tassa,et al.  Efficient Anonymizations with Enhanced Utility , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[43]  R. Califf,et al.  Health Insurance Portability and Accountability Act (HIPAA): must there be a trade-off between privacy and quality of health care, or can we advance both? , 2003, Circulation.

[44]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[45]  Philip R. O. Payne,et al.  Translational informatics: enabling high-throughput research paradigms. , 2009, Physiological genomics.

[46]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[47]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[48]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[49]  Chris Clifton,et al.  A secure distributed framework for achieving k-anonymity , 2006, The VLDB Journal.

[50]  Lynn A. Karoly,et al.  Health Insurance Portability and Accountability Act of 1996 (HIPAA) Administrative Simplification , 2010, Practice Management Consultant.

[51]  B. Knoppers,et al.  Trends in ethical and legal frameworks for the use of human biobanks , 2007, European Respiratory Journal.

[52]  Clement Adebamowo,et al.  ELSI 2.0 for Genomics and Society , 2012, Science.