Characterizing the sample complexity of private learners

In 2008, Kasiviswanathan el al. defined private learning as a combination of PAC learning and differential privacy [16]. Informally, a private learner is applied to a collection of labeled individual information and outputs a hypothesis while preserving the privacy of each individual. Kasiviswanathan et al. gave a generic construction of private learners for (finite) concept classes, with sample complexity logarithmic in the size of the concept class. This sample complexity is higher than what is needed for non-private learners, hence leaving open the possibility that the sample complexity of private learning may be sometimes significantly higher than that of non-private learning. We give a combinatorial characterization of the sample size sufficient and necessary to privately learn a class of concepts. This characterization is analogous to the well known characterization of the sample complexity of non-private learning in terms of the VC dimension of the concept class. We introduce the notion of probabilistic representation of a concept class, and our new complexity measure RepDim corresponds to the size of the smallest probabilistic representation of the concept class. We show that any private learning algorithm for a concept class C with sample complexity m implies RepDim(C) = O(m), and that there exists a private learning algorithm with sample complexity m = O(RepDim(C)). We further demonstrate that a similar characterization holds for the database size needed for privately computing a large class of optimization problems and also for the well studied problem of private data release.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[2]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[3]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[4]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[5]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[6]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[7]  Cynthia Dwork,et al.  The Differential Privacy Frontier (Extended Abstract) , 2009, TCC.

[8]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[9]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[10]  Kamalika Chaudhuri,et al.  Sample Complexity Bounds for Differentially Private Learning , 2011, COLT.

[11]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[12]  Cynthia Dwork The Differential Privacy Frontier , 2009 .

[13]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[14]  Amos Beimel,et al.  Bounds on the sample complexity for private learning and private data release , 2010, Machine Learning.

[15]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[16]  Paz Carmi,et al.  Private approximation of search problems , 2006, STOC '06.

[17]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[18]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[19]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .