Fingerprinting codes and the price of approximate differential privacy

We show new lower bounds on the sample complexity of (ε, δ)-differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database D ∈ ({0, 1}d)n has the form "What fraction of the individual records in the database satisfy the property q?" We show that in order to answer an arbitrary set Q of » nd counting queries on D to within error ±α it is necessary that [EQUATION] This bound is optimal up to poly-logarithmic factors, as demonstrated by the Private Multiplicative Weights algorithm (Hardt and Rothblum, FOCS'10). It is also the first to show that the sample complexity required for (ε, δ)-differential privacy is asymptotically larger than what is required merely for accuracy, which is O(log |Q|/α2). In addition, we show that our lower bound holds for the specific case of k-way marginal queries (where |Q| = 2k(d/k)) when α is a constant. Our results rely on the existence of short fingerprinting codes (Boneh and Shaw, CRYPTO'95; Tardos, STOC'03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample complexity lower bounds into stronger lower bounds.

[1]  Jonathan Ullman,et al.  Answering n{2+o(1)} counting queries with differential privacy is hard , 2012, STOC '13.

[2]  Aaron Roth,et al.  Privately releasing conjunctions and the statistical query barrier , 2010, STOC '11.

[3]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[4]  Amos Beimel,et al.  Private Learning and Sanitization: Pure vs. Approximate Differential Privacy , 2013, APPROX-RANDOM.

[5]  Amos Beimel,et al.  Characterizing the sample complexity of private learners , 2013, ITCS '13.

[6]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[7]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[8]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[9]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[10]  Tim Roughgarden,et al.  Interactive privacy via the median mechanism , 2009, STOC '10.

[11]  Anindya De,et al.  Lower Bounds in Differential Privacy , 2011, TCC.

[12]  Adam D. Smith,et al.  The price of privately releasing contingency tables and the spectra of random matrices with correlated rows , 2010, STOC '10.

[13]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[14]  Andrew Wan,et al.  Faster private release of marginals on small databases , 2013, ITCS.

[15]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[16]  Aaron Roth Differential Privacy and the Fat-Shattering Dimension of Linear Queries , 2010, APPROX-RANDOM.

[17]  Amos Beimel,et al.  Bounds on the sample complexity for private learning and private data release , 2010, Machine Learning.

[18]  Cynthia Dwork,et al.  The price of privacy and the limits of LP decoding , 2007, STOC '07.

[19]  Dan Boneh,et al.  Collusion-Secure Fingerprinting for Digital Data , 1998, IEEE Trans. Inf. Theory.

[20]  Jonathan Ullman,et al.  PCPs and the Hardness of Generating Private Synthetic Data , 2011, TCC.

[21]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[22]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[23]  B. Barak,et al.  A study of privacy and fairness in sensitive data analysis , 2011 .

[24]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[25]  Moni Naor,et al.  Traitor tracing with constant size ciphertext , 2008, CCS.

[26]  Aaron Roth,et al.  Iterative Constructions and Private Data Release , 2011, TCC.

[27]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[28]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[29]  Aleksandar Nikolov,et al.  The geometry of differential privacy: the sparse and approximate cases , 2012, STOC '13.

[30]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[31]  Aggelos Kiayias,et al.  Robust fingerprinting codes: a near optimal construction , 2010, DRM '10.

[32]  Aleksandar Nikolov,et al.  Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations , 2013, Discret. Comput. Geom..

[33]  Cynthia Dwork,et al.  New Efficient Attacks on Statistical Disclosure Control Mechanisms , 2008, CRYPTO.

[34]  Justin Thaler,et al.  Faster Algorithms for Privately Releasing Marginals , 2012, ICALP.