How to Use Heuristics for Differential Privacy

We develop theory for using heuristics to solve computationally hard problems in differential privacy. Heuristic approaches have enjoyed tremendous success in machine learning, for which performance can be empirically evaluated. However, privacy guarantees cannot be evaluated empirically, and must be proven --- without making heuristic assumptions. We show that learning problems over broad classes of functions --- those that have polynomially sized universal identification sets --- can be solved privately and efficiently, assuming the existence of a non-private oracle for solving the same problem. Our first algorithm yields a privacy guarantee that is contingent on the correctness of the oracle. We then give a reduction which applies to a class of heuristics which we call certifiable, which allows us to convert oracle-dependent privacy guarantees to worst-case privacy guarantee that hold even when the heuristic standing in for the oracle might fail in adversarial ways. Finally, we consider classes of functions for which both they and their dual classes have small universal identification sets. This includes most classes of simple boolean functions studied in the PAC learning literature, including conjunctions, disjunctions, parities, and discrete halfspaces. We show that there is an efficient algorithm for privately constructing synthetic data for any such class, given a non-private learning oracle. This in particular gives the first oracle-efficient algorithm for privately generating synthetic data for contingency tables. The most intriguing question left open by our work is whether or not every problem that can be solved differentially privately can be privately solved with an oracle-efficient algorithm. While we do not resolve this, we give a barrier result that suggests that any generic oracle-efficient reduction must fall outside of a natural class of algorithms (which includes the algorithms given in this paper).

[1]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[2]  Aleksandar Nikolov,et al.  The geometry of differential privacy: the sparse and approximate cases , 2012, STOC '13.

[3]  Aaron Roth,et al.  Adaptive Learning with Robust Generalization Guarantees , 2016, COLT.

[4]  Raef Bassily,et al.  Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds , 2014, 1405.7085.

[5]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[6]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[7]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[8]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[9]  Jonathan Ullman,et al.  Answering n{2+o(1)} counting queries with differential privacy is hard , 2012, STOC '13.

[10]  P. Massart,et al.  Adaptive estimation of a quadratic functional by model selection , 2000 .

[11]  Vitaly Feldman,et al.  On Agnostic Learning of Parities, Monomials, and Halfspaces , 2009, SIAM J. Comput..

[12]  Salil Vadhan,et al.  17 58 v 3 [ cs . D S ] 1 4 M ar 2 01 4 Faster Algorithms for Privately Releasing Marginals ∗ , 2018 .

[13]  Vitaly Feldman,et al.  Privacy-preserving Prediction , 2018, COLT.

[14]  Justin Hsu,et al.  Differential privacy for the analyst via private equilibrium computation , 2012, STOC '13.

[15]  Aaron Roth,et al.  Iterative Constructions and Private Data Release , 2011, TCC.

[16]  Anonymous Author Robust Reductions from Ranking to Classification , 2006 .

[17]  Aaron Roth,et al.  Privately releasing conjunctions and the statistical query barrier , 2010, STOC '11.

[18]  Robert E. Schapire,et al.  Exact Identification of Read-Once Formulas Using Fixed Points of Amplification Functions , 1993, SIAM J. Comput..

[19]  Seth Neel,et al.  Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness , 2017, ICML.

[20]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[21]  Prasad Raghavendra,et al.  Agnostic Learning of Monomials by Halfspaces Is Hard , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[22]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[23]  Haipeng Luo,et al.  Oracle-Efficient Online Learning and Auction Design , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[24]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[25]  Martín Abadi,et al.  Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data , 2016, ICLR.

[26]  Ambuj Tewari,et al.  Online Learning via Differential Privacy , 2017, ArXiv.

[27]  Andrew Wan,et al.  Faster private release of marginals on small databases , 2013, ITCS.

[28]  John Langford,et al.  Learning Reductions That Really Work , 2016, Proceedings of the IEEE.

[29]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[30]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[31]  Noga Alon,et al.  Private PAC learning implies finite Littlestone dimension , 2018, STOC.

[32]  Shie Mannor,et al.  Oracle-Based Robust Optimization via Online Learning , 2014, Oper. Res..

[33]  Odalric-Ambrym Maillard,et al.  Concentration inequalities for sampling without replacement , 2013, 1309.4029.

[34]  Yuval Peres,et al.  Concentration of Lipschitz Functionals of Determinantal and Other Strong Rayleigh Measures , 2011, Combinatorics, Probability and Computing.

[35]  Rocco A. Servedio,et al.  Hardness results for agnostically learning low-degree polynomial threshold functions , 2011, SODA '11.

[36]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[37]  Pravesh Kothari,et al.  Learning Coverage Functions and Private Release of Marginals , 2014, COLT.

[38]  Mihir Bellare,et al.  Uniform Generation of NP-Witnesses Using an NP-Oracle , 2000, Inf. Comput..

[39]  Kobbi Nissim,et al.  Differentially Private Release and Learning of Threshold Functions , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[40]  Rocco A. Servedio,et al.  Private data release via learning thresholds , 2011, SODA.

[41]  Jacob Abernethy,et al.  Online Learning via the Differential Privacy Lens , 2019, NeurIPS.

[42]  Anna C. Gilbert,et al.  Property Testing For Differential Privacy , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[43]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[44]  Raef Bassily,et al.  Model-Agnostic Private Learning via Stability , 2018, ArXiv.

[45]  Jonathan Ullman,et al.  PCPs and the Hardness of Generating Private Synthetic Data , 2011, TCC.

[46]  Adam Tauman Kalai,et al.  Unleashing Linear Optimizers for Group-Fair Learning and Optimization , 2018, COLT.

[47]  Toniann Pitassi,et al.  Preserving Statistical Validity in Adaptive Data Analysis , 2014, STOC.

[48]  Marco Gaboardi,et al.  Dual Query: Practical Private Query Release for High Dimensional Data , 2014, ICML.

[49]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[50]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[51]  Elad Hazan,et al.  The computational power of optimization in online learning , 2015, STOC.

[52]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[53]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[54]  Tim Roughgarden,et al.  Interactive privacy via the median mechanism , 2009, STOC '10.

[55]  Akshay Krishnamurthy,et al.  Efficient Algorithms for Adversarial Contextual Learning , 2016, ICML.

[56]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[57]  Adam D. Smith,et al.  The price of privately releasing contingency tables and the spectra of random matrices with correlated rows , 2010, STOC '10.