A learning theory approach to non-interactive database privacy

In this article, we demonstrate that, ignoring computational constraints, it is possible to release synthetic databases that are useful for accurately answering large classes of queries while preserving differential privacy. Specifically, we give a mechanism that privately releases synthetic data useful for answering a class of queries over a discrete domain with error that grows as a function of the size of the smallest net approximately representing the answers to that class of queries. We show that this in particular implies a mechanism for counting queries that gives error guarantees that grow only with the VC-dimension of the class of queries, which itself grows at most logarithmically with the size of the query class. We also show that it is not possible to release even simple classes of queries (such as intervals and their generalizations) over continuous domains with worst-case utility guarantees while preserving differential privacy. In response to this, we consider a relaxation of the utility guarantee and give a privacy preserving polynomial time algorithm that for any halfspace query will provide an answer that is accurate for some small perturbation of the query. This algorithm does not release synthetic data, but instead another data structure capable of representing an answer for each query. We also give an efficient algorithm for releasing synthetic data for the class of interval queries and axis-aligned rectangles of constant dimension over discrete domains.

[1]  J. Davenport Editor , 1960 .

[2]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[3]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[4]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[5]  Anupam Gupta,et al.  An elementary proof of the Johnson-Lindenstrauss Lemma , 1999 .

[6]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .

[7]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[8]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[9]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[10]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[11]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[12]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[13]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[14]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[15]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[16]  Dan Suciu,et al.  Journal of the ACM , 2006 .

[17]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[18]  Santosh S. Vempala,et al.  Kernels as features: On kernels, margins, and low-dimensional mappings , 2006, Machine Learning.

[19]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[20]  Dan Suciu,et al.  The Boundary Between Privacy and Utility in Data Publishing , 2007, VLDB.

[21]  Cynthia Dwork,et al.  The price of privacy and the limits of LP decoding , 2007, STOC '07.

[22]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[23]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[24]  A. Blum,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[25]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[26]  Cynthia Dwork,et al.  New Efficient Attacks on Statistical Disclosure Control Mechanisms , 2008, CRYPTO.

[27]  Rynson W. H. Lau,et al.  Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering , 2008 .

[28]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[29]  Dan Suciu,et al.  Boosting the Accuracy of Differentially-Private Queries Through Consistency , 2009, ArXiv.

[30]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[31]  Dan Suciu,et al.  Boosting the accuracy of differentially private histograms through consistency , 2009, Proc. VLDB Endow..

[32]  Aaron Roth Differential Privacy and the Fat-Shattering Dimension of Linear Queries , 2010, APPROX-RANDOM.

[33]  Adam D. Smith,et al.  The price of privately releasing contingency tables and the spectra of random matrices with correlated rows , 2010, STOC '10.

[34]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[35]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[36]  Tim Roughgarden,et al.  Interactive privacy via the median mechanism , 2009, STOC '10.

[37]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[38]  Jonathan Ullman,et al.  PCPs and the Hardness of Generating Private Synthetic Data , 2011, TCC.

[39]  J. Gehrke,et al.  Differential Privacy via Wavelet Transforms , 2011, IEEE Trans. Knowl. Data Eng..

[40]  Aaron Roth,et al.  Privately releasing conjunctions and the statistical query barrier , 2010, STOC '11.

[41]  Gerome Miklau,et al.  Efficient Batch Query Answering Under Differential Privacy , 2011, ArXiv.

[42]  Anindya De,et al.  Lower Bounds in Differential Privacy , 2011, TCC.

[43]  Gerome Miklau,et al.  An Adaptive Mechanism for Accurate Query Answering under Differential Privacy , 2012, Proc. VLDB Endow..

[44]  Gerome Miklau,et al.  Measuring the achievable error of query sets under differential privacy , 2012, ArXiv.

[45]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[46]  Rocco A. Servedio,et al.  Private data release via learning thresholds , 2011, SODA.

[47]  Aaron Roth,et al.  Iterative Constructions and Private Data Release , 2011, TCC.

[48]  Aleksandar Nikolov,et al.  Optimal private halfspace counting via discrepancy , 2012, STOC '12.