Differentially private data publishing via cross-moment microaggregation

Abstract Differential privacy is one of the most prominent privacy notions in the field of anonymization. However, its strong privacy guarantees very often come at the expense of significantly degrading the utility of the protected data. To cope with this, numerous mechanisms have been studied that reduce the sensitivity of the data and hence the noise required to satisfy this notion. In this paper, we present a generalization of classical microaggregation, where the aggregated records are replaced by the group mean and additional statistical measures, with the purpose of evaluating it as a sensitivity reduction mechanism. We propose an anonymization methodology for numerical microdata in which the target of protection is a data set microaggregated in this generalized way, and the disclosure risk limitation is guaranteed through differential privacy via record-level perturbation. Specifically, we describe three anonymization algorithms where microaggregation can be applied to either entire records or groups of attributes independently. Our theoretical analysis computes the sensitivities of the first two central cross moments; we apply fundamental results from matrix perturbation theory to derive sensitivity bounds on the eigenvalues and eigenvectors of the covariance and coskewness matrices. Our extensive experimental evaluation shows that data utility can be enhanced significantly for medium to large sizes of the microaggregation groups. For this range of group sizes, we find experimental evidence that our approach can provide not only higher utility but also higher privacy than traditional microaggregation.

[1]  Chris Clifton,et al.  How Much Is Enough? Choosing ε for Differential Privacy , 2011, ISC.

[2]  Rathindra Sarathy,et al.  Evaluating Laplace Noise Addition to Satisfy Differential Privacy for Numeric Data , 2011, Trans. Data Priv..

[3]  Yue Wang,et al.  A Data- and Workload-Aware Query Answering Algorithm for Range Queries Under Differential Privacy , 2014, Proc. VLDB Endow..

[4]  Josep Domingo-Ferrer,et al.  Individual Differential Privacy: A Utility-Preserving Formulation of Differential Privacy Guarantees , 2016, IEEE Transactions on Information Forensics and Security.

[5]  Aimo A. Törn,et al.  Global Optimization , 1999, Science.

[6]  Rory A. Fisher,et al.  Moments and Product Moments of Sampling Distributions , 1930 .

[7]  Josep Domingo-Ferrer,et al.  Utility-preserving differentially private data releases via individual ranking microaggregation , 2015, Inf. Fusion.

[8]  A. Azzalini A class of distributions which includes the normal ones , 1985 .

[9]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[10]  Josep Domingo-Ferrer,et al.  Improving the Utility of Differential Privacy via Univariate Microaggregation , 2014, Privacy in Statistical Databases.

[11]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[12]  Rob Kitchin,et al.  The data revolution : big data, open data, data infrastructures & their consequences , 2014 .

[13]  Josep Domingo-Ferrer,et al.  Differentially Private Data Sets Based on Microaggregation and Record Perturbation , 2017, MDAI.

[14]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[15]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control , 2012 .

[16]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[17]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[18]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[20]  Josep Domingo-Ferrer,et al.  Differentially private data publishing via optimal univariate microaggregation and record perturbation , 2018, Knowl. Based Syst..

[21]  Josep Domingo-Ferrer,et al.  Improving the Utility of Differentially Private Data Releases via k-Anonymity , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[22]  Josep Domingo-Ferrer,et al.  Enhancing data utility in differential privacy via microaggregation-based k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume , 2014, The VLDB Journal.

[23]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[24]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[25]  Murat Kantarcioglu,et al.  Mixture of gaussian models and bayes error under differential privacy , 2011, CODASPY '11.

[26]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[27]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[28]  A. Azzalini,et al.  The multivariate skew-normal distribution , 1996 .

[29]  Adam D. Smith,et al.  Discovering frequent patterns in sensitive data , 2010, KDD.

[30]  Defeng Sun,et al.  A Quadratically Convergent Newton Method for Computing the Nearest Correlation Matrix , 2006, SIAM J. Matrix Anal. Appl..

[31]  Simson L. Garfinkel,et al.  Issues Encountered Deploying Differential Privacy , 2018, WPES@CCS.