Data utility and privacy protection trade-off in k-anonymisation

K-anonymisation is an approach to protecting privacy contained within a dataset. A good k-anonymisation algorithm should anonymise a dataset in such a way that private information contained within it is hidden, yet the anonymised data is still useful in intended applications. However, maximising both data utility and privacy protection in k-anonymisation is not possible. Existing methods derive k-anonymisations by trying to maximise utility while satisfying a required level of protection. In this paper, we propose a method that attempts to optimise the trade-off between utility and protection. We introduce a measure that captures both utility and protection, and an algorithm that exploits this measure using a combination of clustering and partitioning techniques. Our experiments show that the proposed method is capable of producing k-anonymisations with required utility and protection trade-off and with a performance scalable to large datasets.

[1]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[4]  Ton de Waal,et al.  Statistical Disclosure Control in Practice , 1996 .

[5]  Feng Zhu,et al.  On Multidimensional k-Anonymity with Local Recoding Generalization , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[6]  C. A. Murthy,et al.  Maxdiff kd-trees for data condensation , 2006, Pattern Recognit. Lett..

[7]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[8]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[9]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[10]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[11]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[13]  Christian Böhm,et al.  Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations , 1998, EDBT.

[14]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[15]  Grigorios Loukides,et al.  Capturing data usefulness and privacy protection in K-anonymisation , 2007, SAC '07.

[16]  Rajeev Motwani,et al.  Approximation Algorithms for k-Anonymity , 2005 .

[17]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[18]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[19]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[20]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[21]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[22]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[23]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[24]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[25]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[26]  Elisa Bertino,et al.  EFFICIENT K-ANONYMITY USING CLUSTERING TECHNIQUE , 2006 .

[27]  Ling Liu,et al.  Location Privacy in Mobile Systems: A Personalized Anonymization Model , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).