k-anonymity: Risks and the Reality

Many a time, datasets containing private and sensitive information are useful for third-party data mining. To prevent identification of personal information, data owners release such data using privacy-preserving data publishing techniques. One well-known technique - k-anonymity - proposes that the records be grouped based on quasi-identifiers such that quasi-identifiers in a group have exactly the same values as any other in the same group. This process reduces the worst-case probability of re-identification of the records based on the quasi identifiers to 1/k. The problem of optimal k-anonymisation is NP-hard. Depending on the k-anonymisation method used and the number of quasi identifiers known to the attacker, the probability of re-identification could be lower than the worst-case guarantee. We quantify risk as the probability of re-identification and propose a mechanism to compute the empirical risk with respect to the cost of acquiring the knowledge about quasi-identifiers, using an real-world dataset released with some k-anonymity guarantee. In addition, we show that k-anonymity can be harmful because the knowledge of additional attributes other than quasi-identifiers can raise the probability of re-identification.

[1]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[2]  Anna Monreale,et al.  Movement data anonymity through generalization , 2009, SPRINGL '09.

[3]  Vitaly Shmatikov,et al.  The cost of privacy: destruction of data-mining utility in anonymized data publishing , 2008, KDD.

[4]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[5]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[6]  Nikos Mamoulis,et al.  Privacy Preservation in the Publication of Trajectories , 2008, The Ninth International Conference on Mobile Data Management (mdm 2008).

[7]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[8]  Latanya Sweeney,et al.  Guaranteeing anonymity when sharing medical data, the Datafly System , 1997, AMIA.

[9]  Nicola Zannone,et al.  Measuring Privacy Compliance Using Fitness Metrics , 2012, BPM.

[10]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[11]  Toru Nakamura,et al.  PPM: Privacy Policy Manager for Personalized Services , 2013, CD-ARES Workshops.

[12]  James A. Landay,et al.  Privacy risk models for designing privacy-sensitive ubiquitous computing systems , 2004, DIS '04.

[13]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[14]  Justine Becker Measuring privacy risk in online social networks , 2009 .

[15]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[16]  David J. Danelski,et al.  Privacy and Freedom , 1968 .

[17]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[18]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  Privacy by design in big data , 2015 .

[20]  Reihaneh Safavi-Naini,et al.  A practice-oriented framework for measuring privacy and utility in data sanitization systems , 2010, EDBT '10.

[21]  Balachander Krishnamurthy,et al.  Measuring privacy loss and the impact of privacy protection in web browsing , 2007, SOUPS '07.

[22]  Josep Domingo-Ferrer,et al.  A Critique of k-Anonymity and Some of Its Enhancements , 2008, 2008 Third International Conference on Availability, Reliability and Security.

[23]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[24]  Toshiaki Tanaka,et al.  Privacy Frost: A User-Oriented Data Anonymization Tool , 2011, 2011 Sixth International Conference on Availability, Reliability and Security.

[25]  Anna Monreale,et al.  A Privacy Risk Model for Trajectory Data , 2014, IFIPTM.

[26]  Tao Yu,et al.  Modeling and Measuring Privacy Risks in QoS Web Services , 2006, The 8th IEEE International Conference on E-Commerce Technology and The 3rd IEEE International Conference on Enterprise Computing, E-Commerce, and E-Services (CEC/EEE'06).

[27]  Francesco Bonchi,et al.  Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[28]  Claudia Eckert,et al.  Flash: Efficient, Stable and Optimal K-Anonymity , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[29]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[30]  Stephen Marsh,et al.  Measuring Privacy , 2011, J. Internet Serv. Inf. Secur..

[31]  Dan Suciu,et al.  The Boundary Between Privacy and Utility in Data Publishing , 2007, VLDB.

[32]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .