Automated k-Anonymization and l-Diversity for Shared Data Privacy

Analyzing data is a cost-intensive process, particularly for organizations lacking the necessary in-house human and computational capital. Data analytics outsourcing offers a cost-effective solution, but data sensitivity and query response time requirements, make data protection a necessary pre-processing step. For performance and privacy reasons, anonymization is preferred over encryption. Yet, manual anonymization is time-intensive and error-prone. Automated anonymization is a better alternative but requires satisfying the conflicting objectives of utility and privacy. In this paper, we present an automated anonymization scheme that extends the standard k-anonymization and l-diversity algorithms to satisfy the dual objectives of data utility and privacy. We use a multi-objective optimization scheme that employs a weighting mechanism, to minimise information loss and maximize privacy. Our results show that automating l-diversity results in an added average information loss of 7i¾?% over automated k-anonymization, but in a diversity of between 9---14i¾?% in comparison to 10---30i¾?% in k-anonymised datasets. The lesson that emerges is that automated l-diversity offers better privacy than k-anonymization and with negligible information loss.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Stefan Gottschalk,et al.  Privacy Preserving Data Mining Models And Algorithms , 2016 .

[3]  Andreas Schaad,et al.  Initial Encryption of large Searchable Data Sets using Hadoop , 2015, SACMAT.

[4]  Sushil Jajodia,et al.  Encryption policies for regulating access to outsourced data , 2010, TODS.

[5]  Michael K. Reiter,et al.  Mitigating Storage Side Channels Using Statistical Privacy Mechanisms , 2015, CCS.

[6]  Ken Eguro,et al.  Querying encrypted data , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[7]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[8]  Gary J. Koehler,et al.  New stopping criterion for genetic algorithms , 2000, Eur. J. Oper. Res..

[9]  Tamir Tassa,et al.  Improving accuracy of classification models induced from anonymized datasets , 2014, Inf. Sci..

[10]  Indrajit Ray,et al.  Exploring privacy versus data quality trade-offs in anonymization techniques using multi-objective optimization , 2011, J. Comput. Secur..

[11]  Indrajit Ray,et al.  A multi-objective approach to data sharing with privacy constraints and preference based objectives , 2009, GECCO '09.

[12]  Sushil Jajodia,et al.  Combining fragmentation and encryption to protect privacy in data storage , 2010, TSEC.

[13]  A. V. Sriharsha,et al.  On Syntactic Anonymity and Differential Privacy , 2015 .

[14]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  Stephen B. Wicker,et al.  The loss of location privacy in the cellular age , 2012, CACM.

[16]  Anne V. D. M. Kayem,et al.  K-Anonymity for Privacy Preserving Crime Data Publishing in Resource Constrained Environments , 2014, 2014 28th International Conference on Advanced Information Networking and Applications Workshops.

[17]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[18]  Siu-Ming Yiu,et al.  Secure query processing with data interoperability in a cloud database environment , 2014, SIGMOD Conference.

[19]  Ashwin Machanavajjhala,et al.  No free lunch in data privacy , 2011, SIGMOD '11.

[20]  Gerardo Pelosi,et al.  Shuffle Index , 2015, ACM Trans. Storage.

[21]  Charu C. Aggarwal On Unifying Privacy and Uncertain Data Models , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[22]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[23]  Kun Liu,et al.  A Survey of Attack Techniques on Privacy-Preserving Data Perturbation Methods , 2008, Privacy-Preserving Data Mining.

[24]  Yufei Tao,et al.  The hardness and approximation algorithms for l-diversity , 2009, EDBT '10.

[25]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[26]  Ernesto Damiani,et al.  ENKI: Access Control for Encrypted Query Processing , 2015, SIGMOD Conference.

[27]  Yücel Saygin,et al.  Instant anonymization , 2011, TODS.

[28]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[29]  Gareth Newham,et al.  THE SAPS CRIME STATISTICS: What they tell us – and what they don’t , 2016 .

[30]  Jun-Lin Lin,et al.  Genetic algorithm-based clustering approach for k-anonymization , 2009, Expert Syst. Appl..

[31]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[32]  Panos Kalnis,et al.  Fast Data Anonymization with Low Information Loss , 2007, VLDB.

[33]  Aderonke Busayo Sakpere,et al.  A Usable and Secure Crime Reporting System for Technology Resource Constrained Context , 2015, 2015 IEEE 29th International Conference on Advanced Information Networking and Applications Workshops.

[34]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[35]  Sabrina De Capitani di Vimercati,et al.  k -Anonymous Data Mining: A Survey , 2008, Privacy-Preserving Data Mining.