Data-Driven Approach for Evaluating Risk of Disclosure and Utility in Differentially Private Data Release

Differential privacy (DP) is a popular technique for protecting individual privacy and at the same for releasing data for public use. However, very few research efforts are devoted to the balance between the corresponding risk of data disclosure (RoD) and data utility. In this paper, we propose data-driven approaches for differentially private data release to evaluate RoD, and offer algorithms to evaluate whether the differentially private synthetic dataset has sufficient privacy. In addition to the privacy, the utility of the synthetic dataset is an important metric for differentially private data release. Thus, we also propose the data-driven algorithm via curve fitting to measure and predict the error of the statistical result incurred by random noise added to the original dataset. Finally, we present an algorithm for choosing appropriate privacy budget E with the balance between the privacy and utility.

[1]  Jerome P. Reiter,et al.  Bayesian Estimation of Disclosure Risks for Multiply Imputed, Synthetic Data , 2014, J. Priv. Confidentiality.

[2]  Yu Zhang,et al.  Differentially Private High-Dimensional Data Publication via Sampling-Based Inference , 2015, KDD.

[3]  Rathindra Sarathy,et al.  Evaluating Laplace Noise Addition to Satisfy Differential Privacy for Numeric Data , 2011, Trans. Data Priv..

[4]  Yennun Huang,et al.  Evaluating the Risk of Data Disclosure Using Noise Estimation for Differential Privacy , 2017, 2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC).

[5]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[6]  Andreas Haeberlen,et al.  Differential Privacy: An Economic Method for Choosing Epsilon , 2014, 2014 IEEE 27th Computer Security Foundations Symposium.

[7]  Chris Clifton,et al.  How Much Is Enough? Choosing ε for Differential Privacy , 2011, ISC.

[8]  Jerome P. Reiter,et al.  Differential Privacy and Statistical Disclosure Risk Measures: An Investigation with Binary Synthetic Data , 2012, Trans. Data Priv..

[9]  Giuseppe D'Acquisto,et al.  Differential Privacy: An Estimation Theory-Based Method for Choosing Epsilon , 2015, ArXiv.

[10]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[11]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[12]  Philip S. Yu,et al.  Differentially private data release for data mining , 2011, KDD.

[13]  Yin Yang,et al.  PrivGene: differentially private model fitting using genetic algorithms , 2013, SIGMOD '13.

[14]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[15]  David McClure Relaxations of differential privacy and risk/utility evaluations of synthetic data and fidelity measures , 2015 .

[16]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.