论文信息 - An Efficient Clustering Algorithm for k-Anonymisation

An Efficient Clustering Algorithm for k-Anonymisation

K-anonymisation is an approach to protecting individuals from being identified from data. Good k-anonymisations should retain data utility and preserve privacy, but few methods have considered these two con°icting requirements together. In this paper, we extend our previous work on a clustering-based method for balancing data utility and privacy protection, and propose a set of heuristics to improve its effectiveness. We introduce new clustering criteria that treat utility and privacy on equal terms and propose sampling-based techniques to optimally set up its parameters. Extensive experiments show that the extended method achieves good accuracy in query answering and is able to prevent linking attacks effectively.

Grigorios Loukides | Jianhua Shao

[1] David J. DeWitt,et al. Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[3] Latanya Sweeney,et al. k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[4] JOHANNES GEHRKE,et al. RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[5] Philip S. Yu,et al. A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[6] Jian Pei,et al. Utility-based anonymization using local recoding , 2006, KDD '06.

[7] Christopher J. Merz,et al. UCI Repository of Machine Learning Databases , 1996 .

[8] David J. DeWitt,et al. Workload-aware anonymization , 2006, KDD '06.

[9] David J. DeWitt,et al. Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[10] Jörg Sander,et al. Data Bubbles for Non-Vector Data: Speeding-up Hierarchical Clustering in Arbitrary Metric Spaces , 2003, VLDB.

[11] Yufei Tao,et al. Anatomy: simple and effective privacy preservation , 2006, VLDB.

[12] Grigorios Loukides,et al. Speeding Up Clustering-Based k -Anonymisation Algorithms with Pre-partitioning , 2007, BNCOD.

[13] Wenliang Du,et al. Comparisons of K-Anonymization and Randomization Schemes under Linking Attacks , 2006, Sixth International Conference on Data Mining (ICDM'06).

[14] Roberto J. Bayardo,et al. Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[15] Grigorios Loukides,et al. Capturing data usefulness and privacy protection in K-anonymisation , 2007, SAC '07.

[16] Elisa Bertino,et al. Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[17] ASHWIN MACHANAVAJJHALA,et al. L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[18] Philip S. Yu,et al. Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[19] Qing Zhang,et al. Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[20] Vijay S. Iyengar,et al. Transforming data to satisfy privacy constraints , 2002, KDD.

[21] Ninghui Li,et al. t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22] Ashwin Machanavajjhala,et al. l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[23] Pierangela Samarati,et al. Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[24] C. A. Murthy,et al. Maxdiff kd-trees for data condensation , 2006, Pattern Recognit. Lett..