Improving the Utility of Differentially Private Data Releases via k-Anonymity

A common view in some data anonymization literature is to oppose the "old'' k-anonymity model to the "new'' differential privacy model, which offers more robust privacy guarantees. However, the utility of the masked results provided by differential privacy is usually limited, due to the amount of noise that needs to be added to the output, or because utility can only be guaranteed for a restricted type of queries. This is in contrast with the general-purpose anonymized data resulting from k-anonymity mechanisms, which also focus on preserving data utility. In this paper, we show that a synergy between differential privacy and k-anonymity can be found when the objective is to release anonymized data: k-anonymity can help improving the utility of the differentially private release. Specifically, we show that the amount of noise required to fulfill ε-differential privacy can be reduced if noise is added to a k-anonymous version of the data set, where k-anonymity is reached through a specially designed microaggregation of all attributes. As a result of noise reduction, the analytical utility of the anonymized output data set is increased. The theoretical benefits of our proposal are illustrated in a practical setting with an empirical evaluation on a reference data set.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Josep Domingo-Ferrer,et al.  Hybrid microdata using microaggregation , 2010, Inf. Sci..

[3]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[4]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[5]  David Sánchez,et al.  Semantically-grounded construction of centroids for datasets with textual attributes , 2012, Knowl. Based Syst..

[6]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[7]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[8]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[9]  Josep Domingo-Ferrer,et al.  A Critique of k-Anonymity and Some of Its Enhancements , 2008, 2008 Third International Conference on Availability, Reliability and Security.

[10]  Benjamin C. M. Fung,et al.  Publishing set-valued data via differential privacy , 2011, Proc. VLDB Endow..

[11]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[12]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[13]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[14]  David Sánchez,et al.  Ontology-based semantic similarity: A new feature-based approach , 2012, Expert Syst. Appl..

[15]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[16]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control: Hundepool/Statistical Disclosure Control , 2012 .

[17]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.