A Systematic Comparison and Evaluation of k-Anonymization Algorithms for Practitioners

The vast amount of data being collected about individuals has brought new challenges in protecting their privacy when this data is disseminated. As a result, Privacy-Preserving Data Publishing has become an active research area, in which multiple anonymization algorithms have been proposed. However, given the large number of algorithms available and limited information regarding their performance, it is difficult to identify and select the most appropriate algorithm given a particular publishing scenario, especially for practitioners. In this paper, we perform a systematic comparison of three well-known k-anonymization algorithms to measure their efficiency (in terms of resources usage) and their effectiveness (in terms of data utility). We extend the scope of their original evaluation by employing a more comprehensive set of scenarios: different parameters, metrics and datasets. Using publicly available implementations of those algorithms, we conduct a series of experiments and a comprehensive analysis to identify the factors that influence their performance, in order to guide practitioners in the selection of an algorithm. We demonstrate through experimental evaluation, the conditions in which one algorithm outperforms the others for a particular metric, depending on the input dataset and privacy requirements. Our findings motivate the necessity of creating methodologies that provide recommendations about the best algorithm given a particular publishing scenario.

[1]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[2]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[3]  Claudia Eckert,et al.  Flash: Efficient, Stable and Optimal K-Anonymity , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[4]  Alina Campan,et al.  Generating Microdata with P -Sensitive K -Anonymity Property , 2007, Secure Data Management.

[5]  Philip S. Yu,et al.  Bottom-up generalization: a data mining solution to privacy protection , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[6]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[8]  Philip S. Yu,et al.  Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques , 2010 .

[9]  Thomas Cerqueus,et al.  Synthetic Data Generation using Benerator Tool , 2013, ArXiv.

[10]  Josep Domingo-Ferrer,et al.  Improving the Utility of Differentially Private Data Releases via k-Anonymity , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[11]  Chris Clifton,et al.  On syntactic anonymity and differential privacy , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[12]  Benjamin C. M. Fung,et al.  Privacy-preserving data publishing , 2007 .

[13]  Chris Clifton,et al.  Thoughts on k-Anonymization , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[14]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[15]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[17]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[18]  Alexandre Pinto,et al.  A Comparison of anonymization protection principles , 2012, 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI).

[19]  U. Rovira,et al.  Chapter 6 A Quantitative Comparison of Disclosure Control Methods for Microdata , 2001 .

[20]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[21]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[22]  Qing Zhang,et al.  Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[23]  Khaled El Emam,et al.  Practicing Differential Privacy in Health Care: A Review , 2013, Trans. Data Priv..

[24]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[25]  Noboru Sonehara,et al.  ON ENHANCING DATA UTILITY IN K-ANONYMIZATION FOR DATA WITHOUT HIERARCHICAL TAXONOMIES , 2013 .

[26]  Lei Chen,et al.  A Survey of Privacy-Preservation of Graphs and Social Networks , 2010, Managing and Mining Graph Data.

[27]  Slava Kisilevich,et al.  Efficient Multidimensional Suppression for K-Anonymity , 2010, IEEE Transactions on Knowledge and Data Engineering.

[28]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[29]  Lian Zhong Liu,et al.  Research on K-Anonymity Algorithm in Privacy Protection , 2013 .

[30]  Sushil Jajodia,et al.  Secure Data Management in Decentralized Systems , 2014, Secure Data Management in Decentralized Systems.

[31]  V. Torra,et al.  Comparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure Risk , 2004 .

[32]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[33]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[34]  P. Joseph Gibson,et al.  An Enhanced Utility-Driven Data Anonymization Method , 2012, Trans. Data Priv..

[35]  Tamir Tassa,et al.  k-Concealment: An Alternative Model of k-Type Anonymity , 2012, Trans. Data Priv..

[36]  Panos Kalnis,et al.  Privacy-preserving anonymization of set-valued data , 2008, Proc. VLDB Endow..

[37]  David Sánchez,et al.  Semantic adaptive microaggregation of categorical microdata , 2012, Comput. Secur..

[38]  Pei-Chann Chang,et al.  Density-based microaggregation for statistical disclosure control , 2010, Expert Syst. Appl..

[39]  Ying Cai,et al.  Feeling-based location privacy protection for location-based services , 2009, CCS.

[40]  Jian Xu,et al.  Utility-based anonymization for privacy preservation with less information loss , 2006, SKDD.

[41]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[42]  Tamir Tassa,et al.  A practical approximation algorithm for optimal k-anonymity , 2011, Data Mining and Knowledge Discovery.

[43]  Jean-Pierre Corriveau,et al.  A globally optimal k-anonymity method for the de-identification of health data. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[44]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[45]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[46]  A. Solanas,et al.  V-MDAV : A Multivariate Microaggregation With Variable Group Size , 2006 .

[47]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[48]  Perry Cheng,et al.  Myths and realities: the performance impact of garbage collection , 2004, SIGMETRICS '04/Performance '04.

[49]  Anna Oganian,et al.  A Framework for Evaluating the Utility of Data Altered to Protect Confidentiality , 2006 .

[50]  Daniel Kifer,et al.  Injecting utility into anonymized datasets , 2006, SIGMOD Conference.

[51]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[52]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[53]  Elisa Bertino,et al.  A Framework for Evaluating Privacy Preserving Data Mining Algorithms* , 2005, Data Mining and Knowledge Discovery.

[54]  Vitaly Shmatikov,et al.  The cost of privacy: destruction of data-mining utility in anonymized data publishing , 2008, KDD.

[55]  Urs Hengartner,et al.  A distributed k-anonymity protocol for location privacy , 2009, 2009 IEEE International Conference on Pervasive Computing and Communications.

[56]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[57]  Nitesh Kumar,et al.  Achieving k-anonymity Using Improved Greedy Heuristics for Very Large Relational Databases , 2013, Trans. Data Priv..

[58]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[59]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[60]  Josep Domingo-Ferrer,et al.  A Survey of Inference Control Methods for Privacy-Preserving Data Mining , 2008, Privacy-Preserving Data Mining.

[61]  MurphyLiam,et al.  A Systematic Comparison and Evaluation of k-Anonymization Algorithms for Practitioners , 2014 .