Model-Agnostic Private Learning

We design differentially private learning algorithms that are agnostic to the learning model assuming access to limited amount of unlabeled public data. First, we give a new differentially private algorithm for answering a sequence of $m$ online classification queries (given by a sequence of $m$ unlabeled public feature vectors) based on a private training set. Our private algorithm follows the paradigm of subsample-and-aggregate, in which any generic non-private learner is trained on disjoint subsets of the private training set, then for each classification query, the votes of the resulting classifiers ensemble are aggregated in a differentially private fashion. Our private aggregation is based on a novel combination of distance-to-instability framework [Smith & Thakurta 2013] and the sparse-vector technique [Dwork et al. 2009, Hardt & Talwar 2010]. We show that our algorithm makes a conservative use of the privacy budget. In particular, if the underlying non-private learner yields classification error at most $\alpha\in (0, 1)$, then our construction answers more queries, by at least a factor of $1/\alpha$ in some cases, than what is implied by a straightforward application of the advanced composition theorem for differential privacy. Next, we apply the knowledge transfer technique to construct a private learner that outputs a classifier, which can be used to answer unlimited number of queries. In the PAC model, we analyze our construction and prove upper bounds on the sample complexity for both the realizable and the non-realizable cases. As in non-private sample complexity, our bounds are completely characterized by the VC dimension of the concept class.

[1]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[2]  Amos Beimel,et al.  Bounds on the Sample Complexity for Private Learning and Private Data Release , 2010, TCC.

[3]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[4]  Amos Beimel,et al.  Characterizing the sample complexity of private learners , 2013, ITCS '13.

[5]  Amos Beimel,et al.  Private Learning and Sanitization: Pure vs. Approximate Differential Privacy , 2013, APPROX-RANDOM.

[6]  Vitaly Feldman,et al.  Privacy-preserving Prediction , 2018, COLT.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Kobbi Nissim,et al.  Differentially Private Release and Learning of Threshold Functions , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[9]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[10]  Raef Bassily,et al.  Model-Agnostic Private Learning via Stability , 2018, ArXiv.

[11]  Úlfar Erlingsson,et al.  Scalable Private Learning with PATE , 2018, ICLR.

[12]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[13]  Daniel Kifer,et al.  Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[14]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[15]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[16]  Mikhail Belkin,et al.  Learning privately from multiparty data , 2016, ICML.

[17]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[18]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[19]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[20]  Adam D. Smith,et al.  Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso , 2013, COLT.

[21]  Kamalika Chaudhuri,et al.  Sample Complexity Bounds for Differentially Private Learning , 2011, COLT.

[22]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[23]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[24]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[25]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[26]  Li Zhang,et al.  Nearly Optimal Private LASSO , 2015, NIPS.