Calibrating Noise to Sensitivity in Private Data Analysis

We continue a line of research initiated in [10, 11] on privacy-preserving statistical databases. Consider a trusted server that holds a database of sensitive information. Given a query function f mapping databases to reals, the so-called true answer is the result of applying f to the database. To protect privacy, the true answer is perturbed by the addition of random noise generated according to a carefully chosen distribution, and this response, the true answer plus noise, is returned to the user. Previous work focused on the case of noisy sums, in which f = Σ i g(x i ), where x i denotes the ith row of the database and g maps database rows to [0,1]. We extend the study to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f. Roughly speaking, this is the amount that any single argument to f can change its output. The new analysis shows that for several particular applications substantially less noise is needed than was previously understood to be the case. The first step is a very clean characterization of privacy in terms of indistinguishability of transcripts. Additionally, we obtain separation results showing the increased value of interactive sanitization mechanisms over non-interactive.

[1]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[2]  Dorothy E. Denning,et al.  Secure statistical databases with random sample queries , 1980, TODS.

[3]  Silvio Micali,et al.  Probabilistic Encryption , 1984, J. Comput. Syst. Sci..

[4]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[5]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[6]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[7]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[8]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[9]  Eli Ben-Sasson,et al.  Some 3CNF properties are hard to test , 2003, STOC '03.

[10]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[11]  C. Dwork,et al.  On the Utility of Privacy-Preserving Histograms , 2004 .

[12]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[13]  Hoeteck Wee,et al.  Toward Privacy in Public Databases , 2005, TCC.

[14]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[15]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[16]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[17]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[18]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[19]  L. Wasserman,et al.  A Statistical Framework for Differential Privacy , 2008, 0811.2501.

[20]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[21]  A. Blum,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[22]  Ashwin Machanavajjhala,et al.  Privacy: Theory meets Practice on the Map , 2008, 2008 IEEE 24th International Conference on Data Engineering.