An approximate microaggregation approach for microdata protection

Microdata protection is a hot topic in the field of Statistical Disclosure Control, which has gained special interest after the disclosure of 658,000 queries by the America Online (AOL) search engine in August 2006. Many algorithms, methods and properties have been proposed to deal with microdata disclosure. One of the emerging concepts in microdata protection is k-anonymity, introduced by Samarati and Sweeney. k-Anonymity provides a simple and efficient approach to protect private individual information and is gaining increasing popularity. k-Anonymity requires that every record in the microdata table released be indistinguishably related to no fewer than k respondents. In this paper, we apply the concept of entropy to propose a distance metric to evaluate the amount of mutual information among records in microdata, and propose a method of constructing dependency tree to find the key attributes, which we then use to process approximate microaggregation. Further, we adopt this new microaggregation technique to study k-anonymity problem, and an efficient algorithm is developed. Experimental results show that the proposed microaggregation technique is efficient and effective in the terms of running time and information loss.

[1]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[2]  Hua Wang,et al.  An efficient hash-based algorithm for minimal k-anonymity , 2008, ACSC.

[3]  Agusti Solanas,et al.  Privacy Protection in Location-Based Services Through a Public-Key Privacy Homomorphism , 2007, EuroPKI.

[4]  Kaizhu Huang,et al.  Constructing a large node Chow-Liu tree based on frequent itemsets , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[5]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[6]  A. Solanas,et al.  V-MDAV : A Multivariate Microaggregation With Variable Group Size , 2006 .

[7]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[8]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Hua Wang,et al.  (p+, α)-sensitive k-anonymity: A new enhanced privacy protection model , 2008, 2008 8th IEEE International Conference on Computer and Information Technology.

[11]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[12]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[13]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[14]  Hua Wang,et al.  On the Complexity of Restricted k-anonymity Problem , 2008, APWeb.

[15]  A. Meyer The Health Insurance Portability and Accountability Act. , 1997, Tennessee medicine : journal of the Tennessee Medical Association.

[16]  Martin Rosemann Erste Ergebnisse von vergleichenden Untersuchungen mit anonymisierten und nicht anonymisierten Einzeldaten am Beispiel der Kostenstrukturerhebung und der Umsatzsteuerstatistik , 2003 .

[17]  V. Torra,et al.  Aggregation techniques for statistical confidentiality , 2002 .

[18]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[19]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[20]  Gao Cong,et al.  Summarizing Frequent Patterns Using Profiles , 2006, DASFAA.

[21]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[22]  Jan C. van der Lubbe,et al.  Information theory , 1997 .

[23]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[24]  Rajeev Motwani,et al.  Anonymizing Tables , 2005, ICDT.

[25]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[26]  Josep Domingo-Ferrer,et al.  On the connections between statistical disclosure control for microdata and some artificial intelligence tools , 2003, Inf. Sci..

[27]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[28]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[29]  Ramayya Krishnan,et al.  On privacy-preserving access to distributed heterogeneous healthcare information , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[30]  Josep Domingo-Ferrer,et al.  A distributed architecture for scalable private RFID tag identification , 2007, Comput. Networks.