The price of privacy and the limits of LP decoding

This work is at theintersection of two lines of research. One line, initiated by Dinurand Nissim, investigates the price, in accuracy, of protecting privacy in a statistical database. The second, growing from an extensive literature on compressed sensing (see in particular the work of Donoho and collaborators [4,7,13,11])and explicitly connected to error-correcting codes by Candès and Tao ([4]; see also [5,3]), is in the use of linearprogramming for error correction. Our principal result is the discovery of a sharp threshhold ρ*∠ 0.239, so that if ρ < ρ* and A is a random m x n encoding matrix of independently chosen standardGaussians, where m = O(n), then with overwhelming probability overchoice of A, for all x ∈ Rn, LP decoding corrects ⌊ ρ m⌋ arbitrary errors in the encoding Ax, while decoding can be made to fail if the error rate exceeds ρ*. Our boundresolves an open question of Candès, Rudelson, Tao, and Vershyin [3] and (oddly, but explicably) refutesempirical conclusions of Donoho [11] and Candès et al [3]. By scaling and rounding we can easilytransform these results to obtain polynomial-time decodable random linear codes with polynomial-sized alphabets tolerating any ρ < ρ* ∠ 0.239 fraction of arbitrary errors. In the context of privacy-preserving datamining our results say thatany privacy mechanism, interactive or non-interactive, providingreasonably accurate answers to a 0.761 fraction of randomly generated weighted subset sum queries, and arbitrary answers on the remaining 0.239 fraction, is blatantly non-private.

[1]  M. Talagrand Concentration of measure and isoperimetric inequalities in product spaces , 1994, math/9406212.

[2]  I. Johnstone,et al.  Minimax estimation via wavelet shrinkage , 1998 .

[3]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[4]  M. Ledoux The concentration of measure phenomenon , 2001 .

[5]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[6]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[7]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[8]  Jon Feldman,et al.  Decoding error-correcting codes via linear programming , 2003 .

[9]  Martin J. Wainwright,et al.  LP Decoding Corrects a Constant Fraction of Errors , 2004, IEEE Transactions on Information Theory.

[10]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[11]  Jon Feldman,et al.  Decoding turbo-like codes via linear programming , 2004, J. Comput. Syst. Sci..

[12]  E. Candès,et al.  Error correction via linear programming , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[13]  Jon Feldman,et al.  LP decoding achieves capacity , 2005, SODA '05.

[14]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[15]  C. Dwork,et al.  Privacy in public databases: A foundational approach , 2005 .

[16]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[17]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[18]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[19]  David L. Donoho,et al.  High-Dimensional Centrally Symmetric Polytopes with Neighborliness Proportional to Dimension , 2006, Discret. Comput. Geom..

[20]  D. Donoho,et al.  Thresholds for the Recovery of Sparse Solutions via L1 Minimization , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[21]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[22]  L. Goldstein L1 bounds in normal approximation , 2007, 0710.3262.

[23]  Emmanuel J. Candès,et al.  Highly Robust Error Correction byConvex Programming , 2006, IEEE Transactions on Information Theory.