A Statistical Framework for Differential Privacy

One goal of statistical privacy research is to construct a data release mechanism that protects individual privacy while preserving information content. An example is a random mechanism that takes an input database X and outputs a random database Z according to a distribution Qn(⋅|X). Differential privacy is a particular privacy requirement developed by computer scientists in which Qn(⋅|X) is required to be insensitive to changes in one data point in X. This makes it difficult to infer from Z whether a given individual is in the original database X. We consider differential privacy from a statistical perspective. We consider several data-release mechanisms that satisfy the differential privacy requirement. We show that it is useful to compare these schemes by computing the rate of convergence of distributions and densities constructed from the released data. We study a general privacy method, called the exponential mechanism, introduced by McSherry and Talwar (2007). We show that the accuracy of this method is intimately linked to the rate at which the probability that the empirical distribution concentrates in a small ball around the true distribution.

[1]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[2]  P. Révész,et al.  A new method to prove strassen type laws of invariance principle. 1 , 1975 .

[3]  P. Révész,et al.  A new method to prove strassen type laws of invariance principle. II , 1975 .

[4]  George T. Duncan,et al.  Disclosure-Limited Data Dissemination , 1986 .

[5]  J. T. Hwang Multiplicative Errors-in-Variables Models with Applications to Recent Data Released by the U.S. Department of Energy , 1986 .

[6]  D. Lambert,et al.  The Risk of Disclosure for Microdata , 1989 .

[7]  George T. Duncan,et al.  Enhancing Access to Microdata while Protecting Confidentiality: Prospects for the Future , 1991 .

[8]  J. Norwood [Enhancing Access to Microdata While Protecting Confidentiality: Prospects for the Future]: Comment , 1991 .

[9]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[10]  Correcting the negativity of high-order kernel density estimators , 1993 .

[11]  Lianfen Qian,et al.  Nonparametric Curve Estimation: Methods, Theory, and Applications , 1999, Technometrics.

[12]  Q. Shao,et al.  Gaussian processes: Inequalities, small ball probabilities and applications , 2001 .

[13]  William E. Winkler,et al.  Multiplicative Noise for Masking Continuous Data , 2001 .

[14]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[15]  Benny Pinkas,et al.  Cryptographic techniques for privacy-preserving data mining , 2002, SKDD.

[16]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[17]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[18]  V. Bentkus On the dependence of the Berry–Esseen bound on dimension , 2003 .

[19]  Stephen E. Fienberg,et al.  Data Swapping: Variations on a Theme by Dalenius and Reiss , 2004, Privacy in Statistical Databases.

[20]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[21]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[22]  Jerome P. Reiter Estimating Risks of Identification Disclosure in Microdata , 2005 .

[23]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[24]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[25]  Roger Barga,et al.  Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, 3-7 April 2006, Atlanta, GA, USA , 2006, ICDE Workshops.

[26]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[27]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[28]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[29]  Joan Feigenbaum,et al.  Secure multiparty computation of approximations , 2001, TALG.

[30]  Artak Amirbekyan,et al.  Privacy-preserving regression algorithms , 2007 .

[31]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[32]  Aleksandra Slavkovic,et al.  "Secure" Logistic Regression of Horizontally and Vertically Partitioned Distributed Databases , 2007 .

[33]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[34]  Cynthia Dwork,et al.  The price of privacy and the limits of LP decoding , 2007, STOC '07.

[35]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[36]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[37]  Angelika Rohde,et al.  Confidence Sets for the Optimal Approximating Model - Bridging a Gap between Adaptive Point Estimation and Confidence Regions , 2008 .

[38]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[39]  A. Blum,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[40]  Stephen E. Fienberg,et al.  Random orthogonal matrix masking methodology for microdata release , 2008, Int. J. Inf. Comput. Secur..

[41]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[42]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[43]  Adam D. Smith,et al.  Efficient, Differentially Private Point Estimators , 2008, ArXiv.

[44]  Dan Suciu,et al.  Relationship privacy: output perturbation for queries with joins , 2009, PODS.

[45]  Haim Kaplan,et al.  Private coresets , 2009, STOC '09.

[46]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[47]  Tim Roughgarden,et al.  Universally utility-maximizing privacy mechanisms , 2008, STOC '09.

[48]  Adaptive Confidence Sets for the Optimal Approximating Model , 2008, 0802.3276.

[49]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.