Boosting and Differential Privacy

Boosting is a general method for improving the accuracy of learning algorithms. We use boosting to construct improved {\em privacy-preserving synopses} of an input database. These are data structures that yield, for a given set $\Q$ of queries over an input database, reasonably accurate estimates of the responses to every query in~$\Q$, even when the number of queries is much larger than the number of rows in the database. Given a {\em base synopsis generator} that takes a distribution on $\Q$ and produces a ``weak'' synopsis that yields ``good'' answers for a majority of the weight in $\Q$, our {\em Boosting for Queries} algorithm obtains a synopsis that is good for all of~$\Q$. We ensure privacy for the rows of the database, but the boosting is performed on the {\em queries}. We also provide the first synopsis generators for arbitrary sets of arbitrary low-sensitivity queries, {\it i.e.}, queries whose answers do not vary much under the addition or deletion of a single row. In the execution of our algorithm certain tasks, each incurring some privacy loss, are performed many times. To analyze the cumulative privacy loss, we obtain an $O(\eps^2)$ bound on the {\em expected} privacy loss from a single $\eps$-\dfp{} mechanism. Combining this with evolution of confidence arguments from the literature, we get stronger bounds on the expected cumulative privacy loss due to multiple mechanisms, each of which provides $\eps$-differential privacy or one of its relaxations, and each of which operates on (potentially) different, adaptively chosen, databases.

[1]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[2]  Yoav Freund,et al.  An improved boosting algorithm and its implications on learning complexity , 1992, COLT '92.

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[4]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[5]  Mihir Bellare,et al.  Relations among Notions of Security for Public-Key Encryption Schemes , 1998, IACR Cryptol. ePrint Arch..

[6]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[7]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[8]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[9]  T. Tao,et al.  The primes contain arbitrarily long arithmetic progressions , 2004, math/0404188.

[10]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[11]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[12]  T. Tao,et al.  The primes contain arbitrarily long polynomial progressions , 2006, math/0610050.

[13]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[14]  Cynthia Dwork,et al.  An Ad Omnia Approach to Defining and Achieving Private Data Analysis , 2007, PinKDD.

[15]  Madhur Tulsiani,et al.  Dense Subsets of Pseudorandom Sets , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[16]  A. Blum,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[17]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[18]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[19]  Tim Roughgarden,et al.  The Median Mechanism: Interactive and Efficient Privacy with Multiple Queries , 2009, ArXiv.

[20]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[21]  Ashwin Machanavajjhala,et al.  Privacy in Search Logs , 2009, ArXiv.

[22]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[23]  Omer Reingold,et al.  Computational Differential Privacy , 2009, CRYPTO.

[24]  Boaz Barak,et al.  The uniform hardcore lemma via approximate Bregman projections , 2009, SODA.

[25]  Frank McSherry Privacy integrated queries , 2010, Commun. ACM.

[26]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[27]  C. Dwork A firm foundation for private data analysis , 2011, Commun. ACM.

[28]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.