A Game Theoretic Framework for Analyzing Re-Identification Risk

Given the potential wealth of insights in personal data the big databases can provide, many organizations aim to share data while protecting privacy by sharing de-identified data, but are concerned because various demonstrations show such data can be re-identified. Yet these investigations focus on how attacks can be perpetrated, not the likelihood they will be realized. This paper introduces a game theoretic framework that enables a publisher to balance re-identification risk with the value of sharing data, leveraging a natural assumption that a recipient only attempts re-identification if its potential gains outweigh the costs. We apply the framework to a real case study, where the value of the data to the publisher is the actual grant funding dollar amounts from a national sponsor and the re-identification gain of the recipient is the fine paid to a regulator for violation of federal privacy rules. There are three notable findings: 1) it is possible to achieve zero risk, in that the recipient never gains from re-identification, while sharing almost as much data as the optimal solution that allows for a small amount of risk; 2) the zero-risk solution enables sharing much more data than a commonly invoked de-identification policy of the U.S. Health Insurance Portability and Accountability Act (HIPAA); and 3) a sensitivity analysis demonstrates these findings are robust to order-of-magnitude changes in player losses and gains. In combination, these findings provide support that such a framework can enable pragmatic policy decisions about de-identified data sharing.

[1]  Dan Cosley,et al.  Inferring social ties from geographic coincidences , 2010, Proceedings of the National Academy of Sciences.

[2]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[3]  Lisa Rajbhandari,et al.  Using Game Theory to Analyze Risk to Privacy: An Initial Insight , 2010, PrimeLife.

[4]  Tobias Scheffer,et al.  Stackelberg games for adversarial prediction problems , 2011, KDD.

[5]  C.T.A.M. de Laat,et al.  A study on the re-identifiability of Dutch citizens , 2010 .

[6]  Paul Ohm Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization , 2009 .

[7]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[8]  Bruno Crispo,et al.  Privacy and Identity Management for Life , 2011, IFIP Advances in Information and Communication Technology.

[9]  Junqing Sun,et al.  Spatial prisoner’s dilemma games with increasing size of the interaction neighborhood on regular lattices , 2012 .

[10]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[11]  Cynthia Dwork,et al.  The Promise of Differential Privacy: A Tutorial on Algorithmic Techniques , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[12]  Vincent Conitzer,et al.  Computing the optimal strategy to commit to , 2006, EC '06.

[13]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[14]  M. Rothstein Is Deidentification Sufficient to Protect Health Privacy in Research? , 2010, The American journal of bioethics : AJOB.

[15]  Bradley Malin,et al.  Evaluating re-identification risks with respect to the HIPAA privacy rule , 2010, J. Am. Medical Informatics Assoc..

[16]  L Sweeney,et al.  Weaving Technology and Policy Together to Maintain Confidentiality , 1997, Journal of Law, Medicine & Ethics.

[17]  Vitaly Shmatikov,et al.  Myths and fallacies of "Personally Identifiable Information" , 2010, Commun. ACM.

[18]  Khaled El Emam,et al.  Anonymizing Health Data: Case Studies and Methods to Get You Started , 2013 .

[19]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[20]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[21]  Melissa A. Basford,et al.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future , 2013, Genetics in Medicine.

[22]  Elisa Bertino,et al.  Beyond k-Anonymity: A Decision Theoretic Framework for Assessing Privacy Risk , 2009, Trans. Data Priv..

[23]  Mark Elliot,et al.  Scenarios of attack: the data intruder's perspective on statistical disclosure risk , 1999 .

[24]  Juan Wang,et al.  Promotion of cooperation due to diversity of players in the spatial public goods game with increasing neighborhood size , 2014 .

[25]  Nicolas Christin,et al.  Audit Games , 2013, IJCAI.

[26]  Ernesto Damiani,et al.  A Game-Theoretical Approach to Data-Privacy Protection from Context-Based Inference Attacks: A Location-Privacy Protection Case Study , 2008, Secure Data Management.

[27]  Kun Liu,et al.  Multi-party, Privacy-Preserving Distributed Data Mining Using a Game Theoretic Framework , 2007, PKDD.

[28]  Ken Barker,et al.  Quantifying Privacy Violations , 2011, Secure Data Management.

[29]  Bradley Malin,et al.  Beyond safe harbor: automatic discovery of health information de-identification policy alternatives , 2010, IHI.

[30]  Khaled El Emam,et al.  A method for evaluating marketer re-identification risk , 2010, EDBT '10.

[31]  Philippe Golle,et al.  Revisiting the uniqueness of simple demographics in the US population , 2006, WPES '06.

[32]  Manish Jain,et al.  Game theory for security: Key algorithmic principles, deployed systems, lessons learned , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[33]  B. Malin,et al.  Anonymization of electronic medical records for validating genome-wide association studies , 2010, Proceedings of the National Academy of Sciences.

[34]  Masato Kimura,et al.  NCBI’s Database of Genotypes and Phenotypes: dbGaP , 2013, Nucleic Acids Res..

[35]  Robert Gellman,et al.  The Deidentification Dilemma: A Legislative and Contractual Proposal , 2010 .

[36]  Jean-Pierre Corriveau,et al.  A globally optimal k-anonymity method for the de-identification of health data. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[37]  Tobias Scheffer,et al.  Nash Equilibria of Static Prediction Games , 2009, NIPS.

[38]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[39]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[40]  Hhs Office for Civil Rights Standards for privacy of individually identifiable health information. Final rule. , 2002, Federal register.

[41]  Jules Polonetsky,et al.  Privacy and Big Data: Making Ends Meet , 2013 .

[42]  Raymond Heatherly,et al.  Efficient discovery of de-identification policy options through a risk-utility frontier , 2013, CODASPY.

[43]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[44]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[45]  Eran Halperin,et al.  Identifying Personal Genomes by Surname Inference , 2013, Science.

[46]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[47]  R Saracci,et al.  Directive of the European Parliament and of the council on the protection of individuals with regard to the processing of personal data and on the free movement of such data. The International Epidemiological Association-IEA European Epidemiological Group. , 1995, International journal of epidemiology.

[48]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[49]  Bo An,et al.  Security games with surveillance cost and optimal timing of attack execution , 2013, AAMAS.

[50]  K. El Emam,et al.  Evaluating Common De-Identification Heuristics for Personal Health Information , 2006, Journal of medical Internet research.

[51]  S. Fiske,et al.  Protecting human research participants in the age of big data , 2014, Proceedings of the National Academy of Sciences.