Distributed Privacy Preserving Data Collection

We study the distributed privacy preserving data collection problem: an untrusted data collector (e.g., a medical research institute) wishes to collect data (e.g., medical records) from a group of respondents (e.g., patients). Each respondent owns a multi-attributed record which contains both non-sensitive (e.g., quasi-identifiers) and sensitive information (e.g., a particular disease), and submits it to the data collector. Assuming T is the table formed by all the respondent data records, we say that the data collection process is privacy preserving if it allows the data collector to obtain a k-anonymized or l-diversified version of T without revealing the original records to the adversary. We propose a distributed data collection protocol that outputs an anonymized table by generalization of quasi-identifier attributes. The protocol employs cryptographic techniques such as homomorphic encryption, private information retrieval and secure multiparty computation to ensure the privacy goal in the process of data collection. Meanwhile, the protocol is designed to leak limited but noncritical information to achieve practicability and efficiency. Experiments show that the utility of the anonymized table derived by our protocol is in par with the utility achieved by traditional anonymization techniques.

[1]  Christos Faloutsos,et al.  Analysis of the Clustering Properties of the Hilbert Space-Filling Curve , 2001, IEEE Trans. Knowl. Data Eng..

[2]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[3]  I. Damglurd Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation , 2006 .

[4]  Li Xiong,et al.  Privacy-preserving data publishing for horizontally partitioned databases , 2008, CIKM '08.

[5]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[6]  Sheng Zhong,et al.  k-Anonymous data collection , 2009, Inf. Sci..

[7]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[8]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[9]  Sheng Zhong,et al.  Privacy-enhancing k-anonymization of customer data , 2005, PODS.

[10]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[12]  Craig Gentry,et al.  Single-Database Private Information Retrieval with Constant Communication Rate , 2005, ICALP.

[13]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[14]  Vitaly Shmatikov,et al.  Efficient anonymity-preserving data collection , 2006, KDD '06.

[15]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[16]  John Bloom,et al.  A modular approach to key safeguarding , 1983, IEEE Trans. Inf. Theory.

[17]  Sheng Zhong,et al.  Anonymity-preserving data collection , 2005, KDD '05.

[18]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[19]  Ali Aydin Selçuk,et al.  Threshold Cryptography Based on Asmuth-Bloom Secret Sharing , 2006, ISCIS.