Decision support for releasing anonymised data

Abstract For legal and privacy reasons it is often prescribed that data bases containing sensitive personal data can be published only in anonymised form. History shows, however, that the privacy of anonymised data in many cases is easily broken by de-anonymisation attacks. This paper defines guiding principles for decisions about releasing anonymised data and provides a simple process for analysing de-anonymisation risk and for making decisions about publishing anonymised personal data. At the heart of this process is an information-theoretic de-anonymisation feasibility limit that is independent of the details of both the anonymisation procedure and the adversarial de-anonymisation algorithms. This feasibility limit relates the adversarial mutual information of the anonymised data and the attacker's background information to the number of records in the anonymised data base and the acceptable risk of privacy violations. Based on this result, we explain, discuss and exemplify the process for making decisions about releasing anonymised data.

[1]  Philip S. Yu,et al.  A Survey of Randomization Methods for Privacy-Preserving Data Mining , 2008, Privacy-Preserving Data Mining.

[2]  Josep Domingo-Ferrer,et al.  From t-Closeness-Like Privacy to Postrandomization via Information Theory , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  Paul Ohm Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization , 2009 .

[4]  Cynthia Dwork,et al.  The Promise of Differential Privacy: A Tutorial on Algorithmic Techniques , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[5]  Zhen Lin,et al.  Genomic Research and Human Subject Privacy , 2004, Science.

[6]  M. Rothstein Is Deidentification Sufficient to Protect Health Privacy in Research? , 2010, The American journal of bioethics : AJOB.

[7]  Vitaly Shmatikov,et al.  Myths and fallacies of "Personally Identifiable Information" , 2010, Commun. ACM.

[8]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[9]  Geoffrey Smith,et al.  On the Foundations of Quantitative Information Flow , 2009, FoSSaCS.

[10]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Keke Chen,et al.  A Survey of Multiplicative Perturbation for Privacy-Preserving Data Mining , 2008, Privacy-Preserving Data Mining.

[12]  Anupam Datta,et al.  Provable De-anonymization of Large Datasets with Sparse Dimensions , 2012, POST.

[13]  Vitaly Shmatikov,et al.  Privacy and Security Myths and Fallacies of Personally Identifiable Information , 2010 .

[14]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[15]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[17]  Massimo Barbaro,et al.  A Face Is Exposed for AOL Searcher No , 2006 .

[18]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[19]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[20]  L Sweeney,et al.  Weaving Technology and Policy Together to Maintain Confidentiality , 1997, Journal of Law, Medicine & Ethics.

[21]  Philippe Golle,et al.  Revisiting the uniqueness of simple demographics in the US population , 2006, WPES '06.

[22]  L. Sweeney Simple Demographics Often Identify People Uniquely , 2000 .

[23]  Dorothy E. Denning,et al.  Inference Controls for Statistical Databases , 1983, Computer.

[24]  Richard A. Gibbs,et al.  No Longer De-Identified , 2006, Science.

[25]  H. Vincent Poor,et al.  A theory of utility and privacy of data sources , 2010, 2010 IEEE International Symposium on Information Theory.

[26]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).