论文信息 - Data utility and privacy protection trade-off in k-anonymisation

Data utility and privacy protection trade-off in k-anonymisation

K-anonymisation is an approach to protecting privacy contained within a dataset. A good k-anonymisation algorithm should anonymise a dataset in such a way that private information contained within it is hidden, yet the anonymised data is still useful in intended applications. However, maximising both data utility and privacy protection in k-anonymisation is not possible. Existing methods derive k-anonymisations by trying to maximise utility while satisfying a required level of protection. In this paper, we propose a method that attempts to optimise the trade-off between utility and protection. We introduce a measure that captures both utility and protection, and an algorithm that exploits this measure using a combination of clustering and partitioning techniques. Our experiments show that the proposed method is capable of producing k-anonymisations with required utility and protection trade-off and with a performance scalable to large datasets.

Grigorios Loukides | Jianhua Shao | G. Loukides | J. Shao

[1] David J. DeWitt,et al. Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[3] Vijay S. Iyengar,et al. Transforming data to satisfy privacy constraints , 2002, KDD.

[4] Ton de Waal,et al. Statistical Disclosure Control in Practice , 1996 .

[5] Feng Zhu,et al. On Multidimensional k-Anonymity with Local Recoding Generalization , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6] C. A. Murthy,et al. Maxdiff kd-trees for data condensation , 2006, Pattern Recognit. Lett..

[7] Jian Pei,et al. Utility-based anonymization using local recoding , 2006, KDD '06.

[8] Yufei Tao,et al. Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[9] Pierangela Samarati,et al. Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[10] Hans-Peter Kriegel,et al. The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[11] Ninghui Li,et al. t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12] David J. DeWitt,et al. Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[13] Christian Böhm,et al. Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations , 1998, EDBT.

[14] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[15] Grigorios Loukides,et al. Capturing data usefulness and privacy protection in K-anonymisation , 2007, SAC '07.

[16] Rajeev Motwani,et al. Approximation Algorithms for k-Anonymity , 2005 .

[17] Roberto J. Bayardo,et al. Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[18] Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[19] Traian Marius Truta,et al. Protection : p-Sensitive k-Anonymity Property , 2006 .

[20] Christopher J. Merz,et al. UCI Repository of Machine Learning Databases , 1996 .

[21] ASHWIN MACHANAVAJJHALA,et al. L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[22] Raymond Chi-Wing Wong,et al. (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[23] Josep Domingo-Ferrer,et al. Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[24] Sudipto Guha,et al. CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[25] Elisa Bertino,et al. Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[26] Elisa Bertino,et al. EFFICIENT K-ANONYMITY USING CLUSTERING TECHNIQUE , 2006 .

[27] Ling Liu,et al. Location Privacy in Mobile Systems: A Personalized Anonymization Model , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).