The Algorithmic Foundations of Differential Privacy

The problem of privacy-preserving data analysis has a long history spanning multiple disciplines. As electronic data about individuals becomes increasingly detailed, and as technology enables ever more powerful collection and curation of these data, the need increases for a robust, meaningful, and mathematically rigorous definition of privacy, together with a computationally rich class of algorithms that satisfy this definition. Differential Privacy is such a definition.After motivating and discussing the meaning of differential privacy, the preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example. A key point is that, by rethinking the computational goal, one can often obtain far better results than would be achieved by methodically replacing each step of a non-private computation with a differentially private implementation. Despite some astonishingly powerful computational results, there are still fundamental limitations — not just on what can be achieved with differential privacy but on what can be achieved with any method that protects against a complete breakdown in privacy. Virtually all the algorithms discussed herein maintain differential privacy against adversaries of arbitrary computational power. Certain algorithms are computationally intensive, others are efficient. Computational complexity for the adversary and the algorithm are both discussed.We then turn from fundamentals to applications other than queryrelease, discussing differentially private methods for mechanism design and machine learning. The vast majority of the literature on differentially private algorithms considers a single, static, database that is subject to many analyses. Differential privacy in other models, including distributed databases and computations on data streams is discussed.Finally, we note that this work is meant as a thorough introduction to the problems and techniques of differential privacy, but is not intended to be an exhaustive survey — there is by now a vast amount of work in differential privacy, and we can cover only a small portion of it.

[1]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[2]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[3]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[4]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[5]  J. Casti Five golden rules: great theories of 20th-century mathematics and why they matter , 1996 .

[6]  L Sweeney,et al.  Weaving Technology and Policy Together to Maintain Confidentiality , 1997, Journal of Law, Medicine & Ethics.

[7]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[8]  Leonid A. Levin,et al.  A Pseudorandom Generator from any One-way Function , 1999, SIAM J. Comput..

[9]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[10]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[11]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[12]  Maria-Florina Balcan,et al.  Mechanism design via machine learning , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[13]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[14]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[15]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[16]  Vitaly Shmatikov,et al.  How To Break Anonymity of the Netflix Prize Dataset , 2006, ArXiv.

[17]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[18]  Y. Mansour,et al.  Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .

[19]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[20]  Cynthia Dwork,et al.  The price of privacy and the limits of LP decoding , 2007, STOC '07.

[21]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[22]  A. Blum,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[23]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[24]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[25]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[26]  Frank McSherry,et al.  Privacy integrated queries: an extensible platform for privacy-preserving data analysis , 2009, SIGMOD Conference.

[27]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[28]  Cynthia Dwork,et al.  Differential privacy and robust statistics , 2009, STOC '09.

[29]  Omer Reingold,et al.  Computational Differential Privacy , 2009, CRYPTO.

[30]  Kunal Talwar,et al.  On the geometry of differential privacy , 2009, STOC '10.

[31]  Moni Naor,et al.  Differential privacy under continual observation , 2010, STOC '10.

[32]  Aaron Roth Differential Privacy and the Fat-Shattering Dimension of Linear Queries , 2010, APPROX-RANDOM.

[33]  Moni Naor,et al.  Pan-Private Streaming Algorithms , 2010, ICS.

[34]  Toniann Pitassi,et al.  The Limits of Two-Party Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[35]  Moni Naor,et al.  On the Difficulties of Disclosure Prevention in Statistical Databases or The Case for Differential Privacy , 2010, J. Priv. Confidentiality.

[36]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[37]  Tim Roughgarden,et al.  Interactive privacy via the median mechanism , 2009, STOC '10.

[38]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[39]  Jonathan Katz,et al.  Limits of Computational Differential Privacy in the Client/Server Setting , 2011, TCC.

[40]  Kamalika Chaudhuri,et al.  Sample Complexity Bounds for Differentially Private Learning , 2011, COLT.

[41]  Elaine Shi,et al.  Private and Continual Release of Statistics , 2010, TSEC.

[42]  Aaron Roth,et al.  Privately releasing conjunctions and the statistical query barrier , 2010, STOC '11.

[43]  Aaron Roth,et al.  Selling privacy at auction , 2010, EC '11.

[44]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[45]  Aleksandar Nikolov,et al.  Pan-private algorithms via statistics on sketches , 2011, PODS.

[46]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[47]  Anindya De,et al.  Lower Bounds in Differential Privacy , 2011, TCC.

[48]  Sampath Kannan,et al.  The Exponential Mechanism for Social Welfare: Private, Truthful, and Nearly Optimal , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[49]  Aaron Roth,et al.  Beating randomized response on incoherent matrices , 2011, STOC '12.

[50]  Anand D. Sarwate,et al.  Near-optimal Differentially Private Principal Components , 2012, NIPS.

[51]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[52]  Kobbi Nissim,et al.  Privacy-aware mechanism design , 2011, EC '12.

[53]  Aaron Roth,et al.  Buying private data at auction: the sensitive surveyor's problem , 2012, SECO.

[54]  Aaron Roth,et al.  Take It or Leave It: Running a Survey When Privacy Comes at a Cost , 2012, WINE.

[55]  Stratis Ioannidis,et al.  Privacy Auctions for Recommender Systems , 2011, TEAC.

[56]  Aditya Bhaskara,et al.  Unconditional differentially private mechanisms for linear queries , 2012, STOC '12.

[57]  Moshe Tennenholtz,et al.  Approximately optimal mechanism design via differential privacy , 2010, ITCS '12.

[58]  Moni Naor,et al.  The Privacy of the Analyst and the Power of the State , 2012, FOCS.

[59]  Daniel Kifer,et al.  Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[60]  Aaron Roth,et al.  Iterative Constructions and Private Data Release , 2011, TCC.

[61]  Pravesh Kothari,et al.  25th Annual Conference on Learning Theory Differentially Private Online Learning , 2022 .

[62]  Ilya Mironov,et al.  On significance of the least significant bits for differential privacy , 2012, CCS.

[63]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[64]  Aaron Roth,et al.  Conducting truthful surveys, cheaply , 2012, EC '12.

[65]  Yu-Han Lyu,et al.  Approximately optimal auctions for selling privacy when costs are correlated with data , 2012, EC '12.

[66]  Arpita Ghosh,et al.  Privacy and coordination: computing on databases with endogenous participation , 2013, EC '13.

[67]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[68]  Aaron Roth,et al.  Privacy and mechanism design , 2013, SECO.

[69]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[70]  Amos Beimel,et al.  Characterizing the sample complexity of private learners , 2013, ITCS '13.

[71]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[72]  Martin J. Wainwright,et al.  Local Privacy, Data Processing Inequalities, and Statistical Minimax Rates , 2013, 1302.3203.

[73]  Amos Beimel,et al.  Bounds on the sample complexity for private learning and private data release , 2010, Machine Learning.

[74]  Jonathan Ullman,et al.  Answering n{2+o(1)} counting queries with differential privacy is hard , 2012, STOC '13.

[75]  Kunal Talwar,et al.  On differentially private low rank approximation , 2013, SODA.

[76]  Aleksandar Nikolov,et al.  The geometry of differential privacy: the sparse and approximate cases , 2012, STOC '13.

[77]  Adam D. Smith,et al.  Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso , 2013, COLT.

[78]  Ian A. Kash,et al.  Truthful mechanisms for agents that value privacy , 2011, EC.

[79]  Aaron Roth,et al.  Beyond worst-case analysis in private singular vector computation , 2012, STOC '13.

[80]  Justin Hsu,et al.  Differential privacy for the analyst via private equilibrium computation , 2012, STOC '13.

[81]  David Xiao,et al.  Is privacy compatible with truthfulness? , 2013, ITCS '13.

[82]  Tim Roughgarden,et al.  Private matchings and allocations , 2013, SIAM J. Comput..

[83]  Cynthia Dwork,et al.  Analyze Gauss: optimal bounds for privacy-preserving PCA , 2014 .

[84]  Aaron Roth,et al.  Mechanism design in large games: incentives and privacy , 2012, ITCS.

[85]  Li Zhang,et al.  Analyze gauss: optimal bounds for privacy-preserving principal component analysis , 2014, STOC.

[86]  Aaron Roth,et al.  Asymptotically truthful equilibrium selection in large congestion games , 2013, EC.

[87]  Aleksandar Nikolov,et al.  Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations , 2013, Discret. Comput. Geom..