Privacy via pseudorandom sketches

Imagine a collection of individuals who each possess private data that they do not wish to share with a third party. This paper considers how individuals may represent and publish their own data so as to simultaneously preserve their privacy and to ensure that it is possible to extract large-scale statistical behavior from the original unperturbed data. Existing techniques for perturbing data are limited by the number of users required to obtain approximate answers to queries, the richness of preserved statistical behavior, the privacy guarantees given and/or the amount of data that each individual must publish.This paper introduces a new technique to describe parts of an individual's data that is based on pseudorandom sketches. The sketches guarantee that each individual's privacy is provably maintained assuming one of the strongest definitions of privacy that we are aware of: given unlimited computational power and arbitrary partial knowledge, the attacker can not learn any additional private information from the published sketches. However, sketches from multiple users that describe a subset of attributes can be used to estimate the fraction of users that satisfy any conjunction over the full set of negated or unnegated attributes provided that there are enough users. We show that the error of approximation is independent of the number of attributes involved and only depends on the number of users available. An additional benefit is that the size of the sketch is minuscule: [log log O(M)] bits, where M is the number of users. Finally, we show how sketches can be combined to answer more complex queries. An interesting property of our approach is that despite using cryptographic primitives, our privacy guarantees do not rely on any unproven cryptographic conjectures.

[1]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[2]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  Jon M. Kleinberg,et al.  Auditing Boolean attributes , 2000, PODS.

[4]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[5]  Jennifer Widom,et al.  Vision Paper: Enabling Privacy for the Paranoids , 2004, VLDB.

[6]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[7]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[8]  Vincent Rijmen,et al.  The WHIRLPOOL Hashing Function , 2003 .

[9]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[10]  Hoeteck Wee,et al.  Toward Privacy in Public Databases , 2005, TCC.

[11]  Rajeev Motwani,et al.  Anonymizing Tables , 2005, ICDT.

[12]  Richard J. Lipton,et al.  Secure databases: protection against user influence , 1979, TODS.

[13]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[14]  Oded Goldreich Foundations of Cryptography: Volume 1 , 2006 .

[15]  Air Force Air Force Materiel Command Hq FIPS-PUB-180-1 , 1995 .

[16]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[17]  Ramakrishnan Srikant,et al.  Privacy preserving OLAP , 2005, SIGMOD '05.

[18]  Nina Mishra,et al.  Simulatable auditing , 2005, PODS.

[19]  Cynthia Dwork,et al.  Privacy-Preserving Datamining on Vertically Partitioned Databases , 2004, CRYPTO.

[20]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[21]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[22]  Oded Goldreich,et al.  Foundations of Cryptography: Basic Tools , 2000 .