Process-Driven Data Privacy

The quantity of personal data gathered by service providers via our daily activities continues to grow at a rapid pace. The sharing, and the subsequent analysis of, such data can support a wide range of activities, but concerns around privacy often prompt an organization to transform the data to meet certain protection models (e.g., k-anonymity or ε-differential privacy). These models, however, are based on simplistic adversarial frameworks, which can lead to both under- and over-protection. For instance, such models often assume that an adversary attacks a protected record exactly once. We introduce a principled approach to explicitly model the attack process as a series of steps. Specifically, we engineer a factored Markov decision process (FMDP) to optimally plan an attack from the adversary's perspective and assess the privacy risk accordingly. The FMDP captures the uncertainty in the adversary's belief (e.g., the number of identified individuals that match the de-identified data) and enables the analysis of various real world deterrence mechanisms beyond a traditional protection model, such as a penalty for committing an attack. We present an algorithm to solve the FMDP and illustrate its efficiency by simulating an attack on publicly accessible U.S. census records against a real identified resource of over 500,000 individuals in a voter registry. Our results demonstrate that while traditional privacy models commonly expect an adversary to attack exactly once per record, an optimal attack in our model may involve exploiting none, one, or more individuals in the pool of candidates, depending on context.

[1]  Mark Elliot,et al.  Understanding the Data Environment , 2013, XRDS.

[2]  Li Xiong,et al.  A two-phase algorithm for mining sequential patterns with differential privacy , 2013, CIKM.

[3]  Ravi Kumar,et al.  "I know what you did last summer": query logs and user privacy , 2007, CIKM '07.

[4]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[5]  O. A. Sawy,et al.  Digital business strategy: toward a next generation of insights , 2013 .

[6]  Yevgeniy Vorobeychik,et al.  Optimal interdiction of attack plans , 2013, AAMAS.

[7]  Bongsik Shin,et al.  Data quality management, data usage experience and acquisition intention of big data analytics , 2014, Int. J. Inf. Manag..

[8]  Cynthia Dwork,et al.  The Promise of Differential Privacy: A Tutorial on Algorithmic Techniques , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[9]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[10]  J. Borges,et al.  A TAXONOMY OF PRIVACY , 2006 .

[11]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[12]  Li Xiong,et al.  Real-time aggregate monitoring with differential privacy , 2012, CIKM.

[13]  Leanne Roderick,et al.  Discipline and Power in the Digital Age: The Case of the US Consumer Data Broker Industry , 2014 .

[14]  Raymond Heatherly,et al.  A Game Theoretic Framework for Analyzing Re-Identification Risk , 2015, PloS one.

[15]  Mark Elliot,et al.  Scenarios of attack: the data intruder's perspective on statistical disclosure risk , 1999 .

[16]  Indrajit Ray,et al.  POkA: identifying pareto-optimal k-anonymous nodes in a domain hierarchy lattice , 2009, CIKM.

[17]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[18]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[19]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[20]  Michael Hicks,et al.  Deanonymizing mobility traces: using social network as a side-channel , 2012, CCS.

[21]  Paul P. Tallon Corporate Governance of Big Data: Perspectives on Value, Risk, and Cost , 2013, Computer.

[22]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[23]  Christian S. Jensen,et al.  Preserving location and absence privacy in geo-social networks , 2010, CIKM '10.

[24]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[25]  Vitaly Shmatikov,et al.  Myths and fallacies of "Personally Identifiable Information" , 2010, Commun. ACM.

[26]  An Application of Game Theory to Understanding Statistical Disclosure Events , 2009 .