Dual Query: Practical Private Query Release for High Dimensional Data

We present a practical, differentially private algorithm for answering a large number of queries on high dimensional datasets. Like all algorithms for this task, ours necessarily has worst-case complexity exponential in the dimension of the data. However, our algorithm packages the computationally hard step into a concisely defined integer program, which can be solved nonprivately using standard solvers. We prove accuracy and privacy theorems for our algorithm, and then demonstrate experimentally that our algorithm performs well in practice. For example, our algorithm can efficiently and accurately answer millions of queries on the Netflix dataset, which has over 17;000 attributes; this is an improvement on the state of the art by multiple orders of magnitude.

[1]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[2]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[3]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[4]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[5]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[6]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[7]  A. Blum,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[8]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[9]  Sofya Raskhodnikova,et al.  What Can We Learn Privately? , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[10]  Kamalika Chaudhuri,et al.  Privacy-preserving logistic regression , 2008, NIPS.

[11]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[12]  Guy N. Rothblum,et al.  A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[13]  Andrew McGregor,et al.  Optimizing linear counting queries under differential privacy , 2009, PODS.

[14]  Tim Roughgarden,et al.  Interactive privacy via the median mechanism , 2009, STOC '10.

[15]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[16]  Jonathan Ullman,et al.  PCPs and the Hardness of Generating Private Synthetic Data , 2011, TCC.

[17]  Kamalika Chaudhuri,et al.  Sample Complexity Bounds for Differentially Private Learning , 2011, COLT.

[18]  Aaron Roth,et al.  Privately releasing conjunctions and the statistical query barrier , 2010, STOC '11.

[19]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[20]  Ling Huang,et al.  Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning , 2009, J. Priv. Confidentiality.

[21]  Gerome Miklau,et al.  An Adaptive Mechanism for Accurate Query Answering under Differential Privacy , 2012, Proc. VLDB Endow..

[22]  Anand D. Sarwate,et al.  Near-optimal Differentially Private Principal Components , 2012, NIPS.

[23]  Katrina Ligett,et al.  A Simple and Practical Algorithm for Differentially Private Data Release , 2010, NIPS.

[24]  S. Vadhan,et al.  Faster Algorithms for Privately Releasing Marginals , 2012, ICALP.

[25]  Daniel Kifer,et al.  Private Convex Empirical Risk Minimization and High-dimensional Regression , 2012, COLT 2012.

[26]  Aaron Roth,et al.  Iterative Constructions and Private Data Release , 2011, TCC.

[27]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[28]  Aaron Roth,et al.  A learning theory approach to non-interactive database privacy , 2008, STOC.

[29]  Martin J. Wainwright,et al.  Local privacy and statistical minimax rates , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[30]  Adam D. Smith,et al.  (Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings , 2013, NIPS.

[31]  Divesh Srivastava,et al.  Accurate and efficient private release of datacubes and contingency tables , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[32]  Amos Beimel,et al.  Characterizing the sample complexity of private learners , 2013, ITCS '13.

[33]  Jonathan Ullman,et al.  Answering n{2+o(1)} counting queries with differential privacy is hard , 2012, STOC '13.

[34]  Aleksandar Nikolov,et al.  The geometry of differential privacy: the sparse and approximate cases , 2012, STOC '13.

[35]  Justin Hsu,et al.  Differential privacy for the analyst via private equilibrium computation , 2012, STOC '13.

[36]  Kamalika Chaudhuri,et al.  A Stability-based Validation Procedure for Differentially Private Machine Learning , 2013, NIPS.

[37]  Jun Zhang,et al.  PrivBayes: private data release via bayesian networks , 2014, SIGMOD Conference.

[38]  Aleksandar Nikolov,et al.  Using Convex Relaxations for Efficiently and Privately Releasing Marginals , 2014, SoCG.

[39]  Jonathan Ullman Answering n2+o(1) Counting Queries with Differential Privacy is Hard , 2016, SIAM J. Comput..