Differentially Private Release and Learning of Threshold Functions

We prove new upper and lower bounds on the sample complexity of (ε, δ) differentially private algorithms for releasing approximate answers to threshold functions. A threshold function c over a totally ordered domain X evaluates to cz(y) = 1 if y ≤ x, and evaluates to 0 otherwise. We give the first nontrivial lower bound for releasing thresholds with (ε, δ) differential privacy, showing that the task is impossible over an infinite domain X, and moreover requires sample complexity n ≥ Ω(log* |X|), which grows with the size of the domain. Inspired by the techniques used to prove this lower bound, we give an algorithm for releasing thresholds with n ≤ 2(1+ο(1)) log* |X| samples. This improves the previous best upper bound of 8(1+ο(1)) log* |X| (Beimel et al., RANDOM '13). Our sample complexity upper and lower bounds also apply to the tasks of learning distributions with respect to Kolmogorov distance and of properly PAC learning thresholds with differential privacy. The lower bound gives the first separation between the sample complexity of properly learning a concept class with (ε, δ) differential privacy and learning without privacy. For properly learning thresholds in ℓ dimensions, this lower bound extends to n ≥ Ω(ℓ · log* |X|). To obtain our results, we give reductions in both directions from releasing and properly learning thresholds and the simpler interior point problem. Given a database D of elements from X, the interior point problem asks for an element between the smallest and largest elements in D. We introduce new recursive constructions for bounding the sample complexity of the interior point problem, as well as further reductions and techniques for proving impossibility results for other basic problems in differential privacy.

[1]  Peter Kulchyski and , 2015 .

[2]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[3]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[4]  Constantinos Daskalakis,et al.  Faster and Sample Near-Optimal Algorithms for Proper Learning Mixtures of Gaussians , 2013, COLT.

[5]  Jonathan Ullman,et al.  Fingerprinting codes and the price of approximate differential privacy , 2013, STOC.

[6]  Aleksandar Nikolov,et al.  The geometry of differential privacy: the sparse and approximate cases , 2012, STOC '13.

[7]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[8]  Aaron Roth,et al.  Privately releasing conjunctions and the statistical query barrier , 2010, STOC '11.

[9]  Aleksandar Nikolov,et al.  Optimal private halfspace counting via discrepancy , 2012, STOC '12.

[10]  Kamalika Chaudhuri,et al.  The Large Margin Mechanism for Differentially Private Maximization , 2014, NIPS.

[11]  J. Kiefer,et al.  Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator , 1956 .

[12]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[13]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[14]  Dan Boneh,et al.  Collusion-Secure Fingerprinting for Digital Data , 1998, IEEE Trans. Inf. Theory.

[15]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[16]  Anindya De,et al.  Lower Bounds in Differential Privacy , 2011, TCC.

[17]  Rocco A. Servedio,et al.  Learning k-Modal Distributions via Testing , 2011, Theory Comput..

[18]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[19]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[20]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[21]  Moni Naor,et al.  Differential privacy under continual observation , 2010, STOC '10.

[22]  P. Massart The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[23]  Amos Beimel,et al.  Bounds on the sample complexity for private learning and private data release , 2010, Machine Learning.

[24]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[25]  Amos Beimel,et al.  Characterizing the sample complexity of private learners , 2013, ITCS '13.

[26]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[27]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[28]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[29]  Rocco A. Servedio,et al.  Testing k-Modal Distributions: Optimal Algorithms via Reductions , 2011, SODA.

[30]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[31]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[32]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[33]  Amos Beimel,et al.  Private Learning and Sanitization: Pure vs. Approximate Differential Privacy , 2013, APPROX-RANDOM.

[34]  Li Zhang,et al.  Analyze gauss: optimal bounds for privacy-preserving principal component analysis , 2014, STOC.

[35]  Aditya Bhaskara,et al.  Unconditional differentially private mechanisms for linear queries , 2012, STOC '12.

[36]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[37]  Kamalika Chaudhuri,et al.  Sample Complexity Bounds for Differentially Private Learning , 2011, COLT.

[38]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[39]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[40]  Vitaly Feldman,et al.  Sample Complexity Bounds on Differentially Private Learning via Communication Complexity , 2014, SIAM J. Comput..

[41]  Raef Bassily,et al.  Private Empirical Risk Minimization, Revisited , 2014, ArXiv.