Flash: Efficient, Stable and Optimal K-Anonymity

K-anonymization is an important technique for the de-identification of sensitive datasets. In this paper, we briefly describe an implementation framework which has been carefully engineered to meet the needs of an important class of k-anonymity algorithms. We have implemented and evaluated two major well-known algorithms within this framework and show that it allows for highly efficient implementations. Regarding their runtime behaviour, we were able to closely reproduce the results from previous publications but also found some algorithmic limitations. Furthermore, we propose a new algorithm that achieves very good performance by implementing a novel strategy and exploiting different aspects of our implementation framework. In contrast to the current state-of-the-art, our algorithm offers algorithmic stability, with execution time being independent of the actual representation of the input data. Experiments with different real-world datasets show that our solution clearly outperforms the previous algorithms.

[1]  Latanya Sweeney,et al.  Datafly: A System for Providing Anonymity in Medical Data , 1997, DBSec.

[2]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[3]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[4]  Latanya Sweeney,et al.  Computational disclosure control: a primer on data privacy protection , 2001 .

[5]  Claudia Eckert,et al.  Highly efficient optimal k-anonymity for biomedical datasets , 2012, 2012 25th IEEE International Symposium on Computer-Based Medical Systems (CBMS).

[6]  Sabrina De Capitani di Vimercati,et al.  k -Anonymous Data Mining: A Survey , 2008, Privacy-Preserving Data Mining.

[7]  Ninghui Li,et al.  Provably Private Data Anonymization: Or, k-Anonymity Meets Differential Privacy , 2011, ArXiv.

[8]  Adam Meyerson,et al.  On the complexity of optimal K-anonymity , 2004, PODS.

[9]  Rajeev Motwani,et al.  Approximation Algorithms for k-Anonymity , 2005 .

[10]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[11]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[12]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[13]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[14]  Khaled El Emam,et al.  Risk-Based De-Identification of Health Data , 2010, IEEE Secur. Priv..

[15]  Jean-Pierre Corriveau,et al.  A globally optimal k-anonymity method for the de-identification of health data. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[16]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[17]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[18]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[19]  Khaled El Emam,et al.  The application of differential privacy to health data , 2012, EDBT-ICDT '12.

[20]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[21]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[22]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[23]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..