Sample-efficient proper PAC learning with approximate differential privacy

In this paper we prove that the sample complexity of properly learning a class of Littlestone dimension d with approximate differential privacy is Õ(d6), ignoring privacy and accuracy parameters. This result answers a question of Bun et al. (FOCS 2020) by improving upon their upper bound of 2O(d) on the sample complexity. Prior to our work, finiteness of the sample complexity for privately learning a class of finite Littlestone dimension was only known for improper private learners, and the fact that our learner is proper answers another question of Bun et al., which was also asked by Bousquet et al. (NeurIPS 2020). Using machinery developed by Bousquet et al., we then show that the sample complexity of sanitizing a binary hypothesis class is at most polynomial in its Littlestone dimension and dual Littlestone dimension. This implies that a class is sanitizable if and only if it has finite Littlestone dimension. An important ingredient of our proofs is a new property of binary hypothesis classes that we call irreducibility, which may be of independent interest.

[1]  Haim Kaplan,et al.  Privately Learning Thresholds: Closing the Exponential Gap , 2019, COLT.

[2]  Kobbi Nissim,et al.  Simultaneous Private Learning of Multiple Concepts , 2015, ITCS.

[3]  Aleksandar Nikolov,et al.  The geometry of differential privacy: the sparse and approximate cases , 2012, STOC '13.

[4]  Amos Beimel,et al.  Private Learning and Sanitization: Pure vs. Approximate Differential Privacy , 2013, APPROX-RANDOM.

[5]  Shai Ben-David,et al.  Agnostic Online Learning , 2009, COLT.

[6]  Dear Mr Sotiropoulos ARTICLE 29 Data Protection Working Party , 2013 .

[7]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[8]  Kobbi Nissim,et al.  Towards formalizing the GDPR’s notion of singling out , 2019, Proceedings of the National Academy of Sciences.

[9]  Noga Alon,et al.  Adversarial laws of large numbers and optimal regret in online classification , 2021, STOC.

[10]  Haim Kaplan,et al.  How to Find a Point in the Convex Hull Privately , 2020, Symposium on Computational Geometry.

[11]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[12]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[13]  Jacob Abernethy,et al.  Online Learning via the Differential Privacy Lens , 2019, NeurIPS.

[14]  Seth Neel,et al.  How to Use Heuristics for Differential Privacy , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[15]  Noga Alon,et al.  Private PAC learning implies finite Littlestone dimension , 2018, STOC.

[16]  Amos Beimel,et al.  Learning Privately with Labeled and Unlabeled Examples , 2015, SODA.

[17]  O. Bousquet,et al.  Synthetic Data Generators - Sequential and Private , 2019, NeurIPS.

[18]  Shay Moran,et al.  Sample compression schemes for VC classes , 2015, 2016 Information Theory and Applications Workshop (ITA).

[19]  Roi Livni,et al.  An Equivalence Between Private Classification and Online Prediction , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[20]  Kobbi Nissim,et al.  Differentially Private Release and Learning of Threshold Functions , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[21]  Badih Ghazi,et al.  Differentially Private Clustering: Tight Approximation Ratios , 2020, NeurIPS.

[22]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[23]  A. W. van der Vaart,et al.  Uniform Central Limit Theorems , 2001 .

[24]  Thomas Steinke,et al.  Towards Instance-Optimal Private Query Release , 2018, SODA.

[25]  Noga Alon,et al.  Closure Properties for Private Classification and Online Prediction , 2020, COLT.

[26]  Thomas Steinke,et al.  Composable and versatile privacy via truncated CDP , 2018, STOC.

[27]  Aditya Bhaskara,et al.  Unconditional differentially private mechanisms for linear queries , 2012, STOC '12.

[28]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[29]  Jonathan Ullman,et al.  Fingerprinting Codes and the Price of Approximate Differential Privacy , 2018, SIAM J. Comput..

[30]  Vitaly Feldman,et al.  Sample Complexity Bounds on Differentially Private Learning via Communication Complexity , 2014, SIAM J. Comput..

[31]  Aleksandar Nikolov,et al.  The power of factorization mechanisms in local and central differential privacy , 2019, STOC.

[32]  M. Sion On general minimax theorems , 1958 .

[33]  Haim Kaplan,et al.  Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample Complexity , 2020, NeurIPS.

[34]  Shay Moran,et al.  Private Center Points and Learning of Halfspaces , 2019, COLT.

[35]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[36]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[37]  Shay Moran,et al.  Private Learning Implies Online Learning: An Efficient Reduction , 2019, NeurIPS.

[38]  P. Assouad Densité et dimension , 1983 .

[39]  Aleksandar Nikolov,et al.  An Improved Private Mechanism for Small Databases , 2015, ICALP.

[40]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[41]  Salil P. Vadhan,et al.  The Complexity of Differential Privacy , 2017, Tutorials on the Foundations of Cryptography.

[42]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[43]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[44]  Amos Beimel,et al.  Characterizing the Sample Complexity of Pure Private Learners , 2019, J. Mach. Learn. Res..

[45]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[46]  Amos Beimel,et al.  Bounds on the sample complexity for private learning and private data release , 2010, Machine Learning.

[47]  Mark Bun A Computational Separation between Private Learning and Online Learning , 2020, NeurIPS.

[48]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[49]  Tim Roughgarden,et al.  Interactive privacy via the median mechanism , 2009, STOC '10.

[50]  Karan Singh,et al.  The Price of Differential Privacy for Online Learning , 2017, ICML.

[51]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[52]  Thomas Steinke,et al.  Bridging the Gap between Computer Science and Legal Approaches to Privacy , 2018 .

[53]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[54]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.