PCPs and the Hardness of Generating Synthetic Data

Assuming the existence of one-way functions, we show that there is no polynomial-time differentially private algorithm A\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {A}}$$\end{document} that takes a database D∈({0,1}d)n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D\in (\{0,1\}^d)^n$$\end{document} and outputs a “synthetic database” D^\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{D}}$$\end{document} all of whose two-way marginals are approximately equal to those of D. (A two-way marginal is the fraction of database rows x∈{0,1}d\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x\in \{0,1\}^d$$\end{document} with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS ‘07), who gave an algorithm running in time poly(n,2d)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {poly}(n,2^d)$$\end{document}. Our proof combines a construction of hard-to-sanitize databases based on digital signatures (by Dwork et al., STOC ‘09) with encodings based on the PCP theorem. We also present both negative and positive results for generating “relaxed” synthetic data, where the fraction of rows in D satisfying a predicate c are estimated by applying c to each row of D^\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{D}}$$\end{document} and aggregating the results in some way.

[1]  Philip Calvert,et al.  Encyclopedia of Database Technologies and Applications , 2005 .

[2]  Tim Roughgarden,et al.  Interactive privacy via the median mechanism , 2009, STOC '10.

[3]  Daniel A. Spielman,et al.  Linear-time encodable and decodable error-correcting codes , 1995, STOC '95.

[4]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[5]  Moni Naor,et al.  Universal one-way hash functions and their cryptographic applications , 1989, STOC '89.

[6]  Oded Goldreich Foundations of Cryptography: Volume 1 , 2006 .

[7]  Oded Goldreich,et al.  Universal arguments and their applications , 2002, Proceedings 17th IEEE Annual Conference on Computational Complexity.

[8]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[9]  Oded Goldreich,et al.  On the foundations of cryptography , 2019, Providing Sound Foundations for Cryptography.

[10]  Nadia Creignou,et al.  A Dichotomy Theorem for Maximum Generalized Satisfiability Problems , 1995, J. Comput. Syst. Sci..

[11]  Daniele Micciancio,et al.  Asymptotically Effi cient Lattice-Based Digital Signatures , 2008, IACR Cryptol. ePrint Arch..

[12]  G. Casella,et al.  International Encyclopedia of the Social and Behavioral Sciences , 2001 .

[13]  Eli Ben-Sasson,et al.  Robust PCPs of Proximity, Shorter PCPs, and Applications to Coding , 2004, SIAM J. Comput..

[14]  Leonid A. Levin,et al.  Checking computations in polylogarithmic time , 1991, STOC '91.

[15]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[16]  N. Smelser,et al.  International Encyclopedia of the Social and Behavioral Sciences , 2001 .

[17]  Vitaly Feldman Hardness of approximate two-level logic minimization and PAC learning with membership queries , 2009, J. Comput. Syst. Sci..

[18]  Mark Braverman,et al.  The complexity of properly learning simple concept classes , 2008, J. Comput. Syst. Sci..

[19]  Oded Goldreich,et al.  Universal Arguments and their Applications , 2008, SIAM J. Comput..

[20]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[21]  Luca Trevisan,et al.  The Approximability of Constraint Satisfaction Problems , 2001, SIAM J. Comput..

[22]  Rocco A. Servedio,et al.  Private data release via learning thresholds , 2011, SODA.

[23]  Aleksandar Nikolov,et al.  Using Convex Relaxations for Efficiently and Privately Releasing Marginals , 2014, SoCG.

[24]  Joe Kilian,et al.  A note on efficient zero-knowledge proofs and arguments (extended abstract) , 1992, STOC '92.

[25]  Jørn Justesen,et al.  On the complexity of decoding Reed-Solomon codes (Corresp.) , 1976, IEEE Trans. Inf. Theory.

[26]  Vitaly Feldman,et al.  Hardness of Proper Learning , 2008, Encyclopedia of Algorithms.

[27]  Omer Reingold,et al.  Assignment testers: towards a combinatorial proof of the PCP-theorem , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[28]  S. Rajsbaum Foundations of Cryptography , 2014 .

[29]  Guy N. Rothblum,et al.  Boosting and Differential Privacy , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[30]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[31]  Sanjeev Arora,et al.  Probabilistic checking of proofs: a new characterization of NP , 1998, JACM.

[32]  Johan Håstad,et al.  Some optimal inapproximability results , 2001, JACM.

[33]  Devdatt P. Dubhashi,et al.  Concentration of Measure for Randomized Algorithms: Techniques and Analysis , 2001 .

[34]  Mihalis Yannakakis,et al.  Optimization, approximation, and complexity classes , 1991, STOC '88.

[35]  John Rompel,et al.  One-way functions are necessary and sufficient for secure signatures , 1990, STOC '90.

[36]  Silvio Micali,et al.  Computationally Sound Proofs , 2000, SIAM J. Comput..

[37]  Jorge Horacio Doorn,et al.  Encyclopedia of Database Technologies and Applications , 2005 .

[38]  Cynthia Dwork,et al.  Privacy, accuracy, and consistency too: a holistic solution to contingency table release , 2007, PODS.

[39]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[40]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[41]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[42]  Aaron Roth,et al.  A learning theory approach to noninteractive database privacy , 2011, JACM.

[43]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[44]  Jerome P. Reiter,et al.  Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality , 2007 .

[45]  R Hodder-Williams International Encyclopedia of the Social and Behavioral Sciences , 2001 .

[46]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[47]  Carsten Lund,et al.  Proof verification and hardness of approximation problems , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[48]  Justin Thaler,et al.  Faster Algorithms for Privately Releasing Marginals , 2012, ICALP.

[49]  Andrew Wan,et al.  Faster private release of marginals on small databases , 2013, ITCS.