Aggregate Query Answering on Anonymized Tables

Privacy is a serious concern when microdata need to be released for ad hoc analyses. The privacy goals of existing privacy protection approaches (e.g., k-anonymity and l-diversity) are suitable only for categorical sensitive attributes. Since applying them directly to numerical sensitive attributes (e.g., salary) may result in undesirable information leakage, we propose privacy goals to better capture the need of privacy protection for numerical sensitive attributes. Complementing the desire for privacy is the need to support ad hoc aggregate analyses over microdata. Existing generalization-based anonymization approaches cannot answer aggregate queries with reasonable accuracy. We present a general framework of permutation-based anonymization to support accurate answering of aggregate queries and show that, for the same grouping, permutation-based techniques can always answer aggregate queries more accurately than generalization-based approaches. We further propose several criteria to optimize permutations for accurate answering of aggregate queries, and develop efficient algorithms for each criterion.

[1]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[2]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[3]  Sushil Jajodia,et al.  Checking for k-Anonymity Violation by Views , 2005, VLDB.

[4]  Stephen E. Fienberg,et al.  Data Swapping: Variations on a Theme by Dalenius and Reiss , 2004, Privacy in Statistical Databases.

[5]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[6]  Rajeev Motwani,et al.  Anonymizing Tables , 2005, ICDT.

[7]  Rathindra Sarathy,et al.  Security of random data perturbation methods , 1999, TODS.

[8]  David J. DeWitt,et al.  Multidimensional K-Anonymity , 2005 .

[9]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[10]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[11]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[12]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[13]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[14]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[15]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[16]  S. Reiss,et al.  Data-swapping: A technique for disclosure control , 1982 .

[17]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Latanya Sweeney,et al.  Guaranteeing anonymity when sharing medical data, the Datafly System , 1997, AMIA.