Limits of Private Learning with Access to Public Data

We consider learning problems where the training set consists of two types of examples: private and public. The goal is to design a learning algorithm that satisfies differential privacy only with respect to the private examples. This setting interpolates between private learning (where all examples are private) and classical learning (where all examples are public). We study the limits of learning in this setting in terms of private and public sample complexities. We show that any hypothesis class of VC-dimension $d$ can be agnostically learned up to an excess error of $\alpha$ using only (roughly) $d/\alpha$ public examples and $d/\alpha^2$ private labeled examples. This result holds even when the public examples are unlabeled. This gives a quadratic improvement over the standard $d/\alpha^2$ upper bound on the public sample complexity (where private examples can be ignored altogether if the public examples are labeled). Furthermore, we give a nearly matching lower bound, which we prove via a generic reduction from this setting to the one of private learning without public data.

[1]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[2]  Kamalika Chaudhuri,et al.  Sample Complexity Bounds for Differentially Private Learning , 2011, COLT.

[3]  Amos Beimel,et al.  Private Learning and Sanitization: Pure vs. Approximate Differential Privacy , 2013, APPROX-RANDOM.

[4]  Raef Bassily,et al.  Model-Agnostic Private Learning , 2018, NeurIPS.

[5]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[6]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[7]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[8]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[9]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[10]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[11]  Kobbi Nissim,et al.  Differentially Private Release and Learning of Threshold Functions , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[12]  Úlfar Erlingsson,et al.  Scalable Private Learning with PATE , 2018, ICLR.

[13]  Noga Alon,et al.  Private PAC learning implies finite Littlestone dimension , 2018, STOC.

[14]  Shai Ben-David,et al.  Agnostic Online Learning , 2009, COLT.

[15]  Mikhail Belkin,et al.  Learning privately from multiparty data , 2016, ICML.

[16]  Amos Beimel,et al.  Learning Privately with Labeled and Unlabeled Examples , 2014, Algorithmica.

[17]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[18]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).