Beyond De-Identification Record Falsification to Disarm Expropriated Data-Sets

The wild enthusiasm for big data and open data has brought with it the assumptions that the utility of data-sets is what matters, and that privacy interests are to be sacrificed for the greater good. As a result, techniques have been devised to reduce the identifiability of expropriated data-records, on the assumption that privacy is to be compromised to the extent necessary. This paper argues for and adopts data privacy as the objective, and treats data utility for secondary purposes as the constraint. The inadequacies of both the concept and the implementation of deidentification are underlined. Synthetic data and Known Irreversible Record Falsification (KIRF) are identified as the appropriate techniques to protect against harm arising from expropriated data-sets.

[1]  Roger Clarke,et al.  Guidelines for the responsible application of data analytics , 2017, Comput. Law Secur. Rev..

[2]  Peter Christen,et al.  Accurate Synthetic Generation of Realistic Personal Information , 2009, PAKDD.

[3]  Jörg Drechsler,et al.  Remote Data Access and the Risk of Disclosure from Linear Regression: An Empirical Study , 2010, Privacy in Statistical Databases.

[4]  Simson L. Garfinkel,et al.  De-Identification of Personal Information , 2015 .

[5]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[6]  Vitaly Shmatikov,et al.  Myths and fallacies of "Personally Identifiable Information" , 2010, Commun. ACM.

[7]  Vitaly Shmatikov,et al.  The cost of privacy: destruction of data-mining utility in anonymized data publishing , 2008, KDD.

[8]  George T. Duncan,et al.  Disclosure Risk vs. Data Utility: The R-U Confidentiality Map , 2003 .

[9]  Josep Domingo-Ferrer,et al.  Comment on “Unique in the shopping mall: On the reidentifiability of credit card metadata” , 2015, Science.

[10]  Mun-Cho Kim,et al.  Surveillance Technology, Privacy and Social Control , 2004 .

[11]  Seema Kedar,et al.  Privacy Preserving Data Mining , 2013 .

[12]  Roger Clarke,et al.  The Digital Persona and Its Application to Data Surveillance , 1994, Inf. Soc..

[13]  Hui Zang,et al.  Anonymization of location data does not work: a large-scale measurement study , 2011, MobiCom.

[14]  Dorothy E. Denning,et al.  Secure statistical databases with random sample queries , 1980, TODS.

[15]  Frank Kargl,et al.  Tales from the Dark Side: Privacy Dark Strategies and Privacy Dark Patterns , 2016, Proc. Priv. Enhancing Technol..

[16]  Privacy by design in big data , 2015 .

[17]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[18]  Mark J. Furletti An Overview and History of Credit Reporting , 2002 .

[19]  Roger Clarke A framework for analysing technology’s negative and positive impacts on freedom and privacy , 2016, Datenschutz und Datensicherheit - DuD.

[20]  Benjamin I. P. Rubinstein,et al.  Health Data in an Open World , 2017, ArXiv.

[21]  R. Clarke DATAVEILLANCE BY GOVERNMENTS: THE TECHNIQUE OF COMPUTER MATCHING , 2003 .

[22]  James O. Chipperfield,et al.  A Summary of Attack Methods and Confidentiality Protection Measures for Fully Automated Remote Analysis Systems , 2013 .

[23]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[24]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[25]  Sumit Sarkar,et al.  Privacy Protection in Data Mining: A Perturbation Approach for Categorical Data , 2006, Inf. Syst. Res..

[26]  Jim Waldo,et al.  How to De-identify Your Data , 2015, ACM Queue.

[27]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[28]  Ram D. Gopal,et al.  Ur Scholarship Repository Management Faculty Publications Management Releasing Individually Identifiable Microdata with Privacy Protection against Stochastic Threat: an Application to Health Information Recommended Citation Releasing Individually Identifiable Microdata with Privacy Protection agains , 2022 .

[29]  Frederick Schauer,et al.  Fear, Risk and the First Amendment: Unraveling the Chilling Effect , 1978 .

[30]  Serge Gutwirth,et al.  European Data Protection: Coming of Age , 2013, European Data Protection.

[31]  Alana Maurushat,et al.  Opening up government data for Big Data analysis and public benefit , 2017, Comput. Law Secur. Rev..

[32]  M. Mostert,et al.  Big Data in medical research and EU data protection law: challenges to the consent or anonymise approach , 2016, European Journal of Human Genetics.

[33]  Clarke Roger Quality Assurance for Security Applications of Big Data , 2016 .

[34]  Omer Tene,et al.  Shades of Gray: Seeing the Full Spectrum of Practical Data De-Identification , 2016 .

[35]  Alessandro Acquisti,et al.  Predicting Social Security numbers from public data , 2009, Proceedings of the National Academy of Sciences.

[36]  Arvind Narayanan,et al.  No silver bullet: De-identification still doesn't work , 2014 .

[37]  L. Sweeney Simple Demographics Often Identify People Uniquely , 2000 .

[38]  L. Thygesen,et al.  Introduction to Danish (nationwide) registers on health and social issues: Structure, access, legislation, and archiving , 2011, Scandinavian journal of public health.

[39]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[40]  Y. de Montjoye,et al.  Unique in the shopping mall: On the reidentifiability of credit card metadata , 2015, Science.

[41]  Solon Barocas,et al.  Ten simple rules for responsible big data research , 2017, PLoS Comput. Biol..