论文信息 - Automated k-Anonymization and l-Diversity for Shared Data Privacy

Automated k-Anonymization and l-Diversity for Shared Data Privacy

Analyzing data is a cost-intensive process, particularly for organizations lacking the necessary in-house human and computational capital. Data analytics outsourcing offers a cost-effective solution, but data sensitivity and query response time requirements, make data protection a necessary pre-processing step. For performance and privacy reasons, anonymization is preferred over encryption. Yet, manual anonymization is time-intensive and error-prone. Automated anonymization is a better alternative but requires satisfying the conflicting objectives of utility and privacy. In this paper, we present an automated anonymization scheme that extends the standard k-anonymization and l-diversity algorithms to satisfy the dual objectives of data utility and privacy. We use a multi-objective optimization scheme that employs a weighting mechanism, to minimise information loss and maximize privacy. Our results show that automating l-diversity results in an added average information loss of 7i¾?% over automated k-anonymization, but in a diversity of between 9---14i¾?% in comparison to 10---30i¾?% in k-anonymised datasets. The lesson that emerges is that automated l-diversity offers better privacy than k-anonymization and with negligible information loss.

Christoph Meinel | Anne V. D. M. Kayem | C. T. Vester

[1] Latanya Sweeney,et al. k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2] Stefan Gottschalk,et al. Privacy Preserving Data Mining Models And Algorithms , 2016 .

[3] Andreas Schaad,et al. Initial Encryption of large Searchable Data Sets using Hadoop , 2015, SACMAT.

[4] Sushil Jajodia,et al. Encryption policies for regulating access to outsourced data , 2010, TODS.

[5] Michael K. Reiter,et al. Mitigating Storage Side Channels Using Statistical Privacy Mechanisms , 2015, CCS.

[6] Ken Eguro,et al. Querying encrypted data , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[7] Jian Pei,et al. Utility-based anonymization using local recoding , 2006, KDD '06.

[8] Gary J. Koehler,et al. New stopping criterion for genetic algorithms , 2000, Eur. J. Oper. Res..

[9] Tamir Tassa,et al. Improving accuracy of classification models induced from anonymized datasets , 2014, Inf. Sci..

[10] Indrajit Ray,et al. Exploring privacy versus data quality trade-offs in anonymization techniques using multi-objective optimization , 2011, J. Comput. Secur..

[11] Indrajit Ray,et al. A multi-objective approach to data sharing with privacy constraints and preference based objectives , 2009, GECCO '09.

[12] Sushil Jajodia,et al. Combining fragmentation and encryption to protect privacy in data storage , 2010, TSEC.

[13] A. V. Sriharsha,et al. On Syntactic Anonymity and Differential Privacy , 2015 .

[14] Ninghui Li,et al. t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15] Stephen B. Wicker,et al. The loss of location privacy in the cellular age , 2012, CACM.

[16] Anne V. D. M. Kayem,et al. K-Anonymity for Privacy Preserving Crime Data Publishing in Resource Constrained Environments , 2014, 2014 28th International Conference on Advanced Information Networking and Applications Workshops.

[17] Aaron Roth,et al. The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[18] Siu-Ming Yiu,et al. Secure query processing with data interoperability in a cloud database environment , 2014, SIGMOD Conference.

[19] Ashwin Machanavajjhala,et al. No free lunch in data privacy , 2011, SIGMOD '11.

[20] Gerardo Pelosi,et al. Shuffle Index , 2015, ACM Trans. Storage.

[21] Charu C. Aggarwal. On Unifying Privacy and Uncertain Data Models , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[22] Philip S. Yu,et al. Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[23] Kun Liu,et al. A Survey of Attack Techniques on Privacy-Preserving Data Perturbation Methods , 2008, Privacy-Preserving Data Mining.

[24] Yufei Tao,et al. The hardness and approximation algorithms for l-diversity , 2009, EDBT '10.

[25] Ashwin Machanavajjhala,et al. l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[26] Ernesto Damiani,et al. ENKI: Access Control for Encrypted Query Processing , 2015, SIGMOD Conference.

[27] Yücel Saygin,et al. Instant anonymization , 2011, TODS.

[28] Charu C. Aggarwal,et al. On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[29] Gareth Newham,et al. THE SAPS CRIME STATISTICS: What they tell us – and what they don’t , 2016 .

[30] Jun-Lin Lin,et al. Genetic algorithm-based clustering approach for k-anonymization , 2009, Expert Syst. Appl..

[31] Roberto J. Bayardo,et al. Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[32] Panos Kalnis,et al. Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[33] Aderonke Busayo Sakpere,et al. A Usable and Secure Crime Reporting System for Technology Resource Constrained Context , 2015, 2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops.

[34] Vijay S. Iyengar,et al. Transforming data to satisfy privacy constraints , 2002, KDD.

[35] Sabrina De Capitani di Vimercati,et al. k -Anonymous Data Mining: A Survey , 2008, Privacy-Preserving Data Mining.