Privately Answering Classification Queries in the Agnostic PAC Model

We revisit the problem of differentially private release of classification queries. In this problem, the goal is to design an algorithm that can accurately answer a sequence of classification queries based on a private training set while ensuring differential privacy. We formally study this problem in the agnostic PAC model and derive a new upper bound on the private sample complexity. Our results improve over those obtained in a recent work [BTT18] for the agnostic PAC setting. In particular, we give an improved construction that yields a tighter upper bound on the sample complexity. Moreover, unlike [BTT18], our accuracy guarantee does not involve any blow-up in the approximation error associated with the given hypothesis class. Given any hypothesis class with VC-dimension $d$, we show that our construction can privately answer up to $m$ classification queries with average excess error $\alpha$ using a private sample of size $\approx \frac{d}{\alpha^2}\,\max\left(1, \sqrt{m}\,\alpha^{3/2}\right)$. Using recent results on private learning with auxiliary public data, we extend our construction to show that one can privately answer any number of classification queries with average excess error $\alpha$ using a private sample of size $\approx \frac{d}{\alpha^2}\,\max\left(1, \sqrt{d}\,\alpha\right)$. When $\alpha=O\left(\frac{1}{\sqrt{d}}\right)$, our private sample complexity bound is essentially optimal.

[1]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[2]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[3]  Adam D. Smith,et al.  Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso , 2013, COLT.

[4]  Kobbi Nissim,et al.  Differentially Private Release and Learning of Threshold Functions , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[5]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[6]  Noga Alon,et al.  Limits of Private Learning with Access to Public Data , 2019, NeurIPS.

[7]  Amos Beimel,et al.  Private Learning and Sanitization: Pure vs. Approximate Differential Privacy , 2013, APPROX-RANDOM.

[8]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2016, J. Priv. Confidentiality.

[9]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[10]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[11]  Ninghui Li,et al.  On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy , 2011, ASIACCS '12.

[12]  Noga Alon,et al.  Private PAC learning implies finite Littlestone dimension , 2018, STOC.

[13]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[14]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[15]  Shai Ben-David,et al.  Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning , 2008, COLT.

[16]  Vitaly Feldman,et al.  Privacy-preserving Prediction , 2018, COLT.

[17]  Mikhail Belkin,et al.  Learning privately from multiparty data , 2016, ICML.

[18]  Vitaly Feldman,et al.  PAC learning with stable and private predictions , 2019, COLT 2020.

[19]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[20]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[21]  Raef Bassily,et al.  Model-Agnostic Private Learning , 2018, NeurIPS.

[22]  Úlfar Erlingsson,et al.  Scalable Private Learning with PATE , 2018, ICLR.

[23]  Amos Beimel,et al.  Learning Privately with Labeled and Unlabeled Examples , 2015, SODA.