Improving privacy preservation policy in the modern information age

Anonymization or de-identification techniques are methods for protecting the privacy of human subjects in sensitive data sets while preserving the utility of those data sets. In the case of health data, anonymization techniques may be used to remove or mask patient identities while allowing the health data content to be used by the medical and pharmaceutical research community. The efficacy of anonymization methods has come under repeated attacks and several researchers have shown that anonymized data can be re-identified to reveal the identity of the data subjects via approaches such as “linking.” Nevertheless, even given these deficiencies, many government privacy policies depend on anonymization techniques as the primary approach to preserving privacy. In this report, we survey the anonymization landscape and consider the range of anonymization approaches that can be used to de-identify data containing personally identifiable information. We then review several notable government privacy policies that leverage anonymization. In particular, we review the European Union’s General Data Protection Regulation (GDPR) and show that it takes a more goal-oriented approach to data privacy. It defines data privacy in terms of desired outcome (i.e., as a defense against risk of personal data disclosure), and is agnostic to the actual method of privacy preservation. And GDPR goes further to frame its privacy preservation regulations relative to the state of the art, the cost of implementation, the incurred risks, and the context of data processing. This has potential implications for the GDPR’s robustness to future technological innovations – very much in contrast to privacy regulations that depend explicitly on more definite technical specifications.

[1]  Divesh Srivastava,et al.  Differentially private summaries for sparse data , 2012, ICDT '12.

[2]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Paul Ohm Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization , 2009 .

[4]  Alina Campan,et al.  Generating Microdata with P -Sensitive K -Anonymity Property , 2007, Secure Data Management.

[5]  Josep Domingo-Ferrer,et al.  Differential privacy via t-closeness in data publishing , 2013, 2013 Eleventh Annual Conference on Privacy, Security and Trust.

[6]  Josep Domingo-Ferrer,et al.  A Critique of k-Anonymity and Some of Its Enhancements , 2008, 2008 Third International Conference on Availability, Reliability and Security.

[7]  Riccardo Dondi,et al.  The l-Diversity problem: Tractability and approximability , 2013, Theor. Comput. Sci..

[8]  L. Zayatz Disclosure avoidance practices and research at the U.S. Census Bureau: an update , 2007 .

[9]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[10]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11]  Divesh Srivastava,et al.  Anonymized Data: Generation, models, usage , 2009, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[12]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[13]  Kunal Talwar,et al.  Mechanism Design via Differential Privacy , 2007, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[14]  Y. de Montjoye,et al.  Unique in the shopping mall: On the reidentifiability of credit card metadata , 2015, Science.

[15]  Rathindra Sarathy,et al.  Evaluating Laplace Noise Addition to Satisfy Differential Privacy for Numeric Data , 2011, Trans. Data Priv..

[16]  Justin Reich,et al.  Privacy, Anonymity, and Big Data in the Social Sciences , 2014 .

[17]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[18]  Paola Bonizzoni,et al.  The k-Anonymity Problem Is Hard , 2007, FCT.

[19]  Chris Clifton,et al.  On syntactic anonymity and differential privacy , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[20]  Hoeteck Wee,et al.  Toward Privacy in Public Databases , 2005, TCC.

[21]  Craig Gentry,et al.  Implementing Gentry's Fully-Homomorphic Encryption Scheme , 2011, EUROCRYPT.

[22]  Cynthia Dwork,et al.  An Ad Omnia Approach to Defining and Achieving Private Data Analysis , 2007, PinKDD.

[23]  David Leoni,et al.  Non-interactive differential privacy: a survey , 2012, WOD.

[24]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[25]  Divesh Srivastava,et al.  Differentially Private Publication of Sparse Data , 2011, ArXiv.

[26]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[27]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[28]  Hao Yuan,et al.  On the Complexity of t-Closeness Anonymization and Related Problems , 2013, DASFAA.

[29]  Michael J. Lutz,et al.  Undergraduate Software Engineering: Addressing the Needs of Professional Software Development , 2014 .

[30]  Panos Kalnis,et al.  SABRE: a Sensitive Attribute Bucketization and REdistribution framework for t-closeness , 2011, The VLDB Journal.