Pseudonymization risk analysis in distributed systems

In an era of big data, online services are becoming increasingly data-centric; they collect, process, analyze and anonymously disclose growing amounts of personal data in the form of pseudonymized data sets. It is crucial that such systems are engineered to both protect individual user (data subject) privacy and give back control of personal data to the user. In terms of pseudonymized data this means that unwanted individuals should not be able to deduce sensitive information about the user. However, the plethora of pseudonymization algorithms and tuneable parameters that currently exist make it difficult for a non expert developer (data controller) to understand and realise strong privacy guarantees. In this paper we propose a principled Model-Driven Engineering (MDE) framework to model data services in terms of their pseudonymization strategies and identify the risks to breaches of user privacy. A developer can explore alternative pseudonymization strategies to determine the effectiveness of their pseudonymization strategy in terms of quantifiable metrics: i) violations of privacy requirements for every user in the current data set; ii) the trade-off between conforming to these requirements and the usefulness of the data for its intended purposes. We demonstrate through an experimental evaluation that the information provided by the framework is useful, particularly in complex situations where privacy requirements are different for different users, and can inform decisions to optimize a chosen strategy in comparison to applying an off-the-shelf algorithm.

[1]  Daniel R. Horne,et al.  The Privacy Paradox: Personal Information Disclosure Intentions versus Behaviors , 2007 .

[2]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[3]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[4]  Eran Toch,et al.  Privacy by designers: software developers’ privacy mindset , 2017, Empirical Software Engineering.

[5]  G. Loewenstein,et al.  Privacy and human behavior in the age of information , 2015, Science.

[6]  Mike Surridge,et al.  Towards a Model of User-centered Privacy Preservation , 2017, ARES.

[7]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[8]  Tracy Ann Kosa,et al.  Towards measuring privacy , 2015 .

[9]  Alberto Trombetta,et al.  Integrating Privacy Policies into Business Processes , 2008, J. Res. Pract. Inf. Technol..

[10]  Ponnurangam Kumaraguru,et al.  Privacy Indexes: A Survey of Westin's Studies , 2005 .

[11]  Johannes Gehrke,et al.  Interactive anonymization of sensitive data , 2009, SIGMOD Conference.

[12]  Salil P. Vadhan,et al.  Usable Differential Privacy: A Case Study with PSI , 2018, ArXiv.

[13]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[14]  Fabian Prasser,et al.  Putting Statistical Disclosure Control into Practice: The ARX Data Anonymization Tool , 2015, Medical Data Privacy Handbook.

[15]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[16]  C. Ogden,et al.  Anthropometric reference data for children and adults: United States, 2007-2010. , 2012, Vital and health statistics. Series 11, Data from the National Health Survey.

[17]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[18]  Philip S. Yu,et al.  On Variable Constraints in Privacy Preserving Data Mining , 2005, SDM.

[19]  Samuel Paul Kaluvuri,et al.  A Data-Centric Approach for Privacy-Aware Business Process Enablement , 2011, IWEI.

[20]  Mike Surridge,et al.  Identifying Privacy Risks in Distributed Data Services: A Model-Driven Approach , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[21]  Yufei Tao,et al.  Personalized privacy preservation , 2006, Privacy-Preserving Data Mining.

[22]  Andreas Haeberlen,et al.  A framework for adaptive differential privacy , 2017, Proc. ACM Program. Lang..

[23]  Wouter Joosen,et al.  A privacy threat analysis framework: supporting the elicitation and fulfillment of privacy requirements , 2011, Requirements Engineering.

[24]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[25]  Elena Ferrari,et al.  Towards a Modeling and Analysis Framework for Privacy-Aware Systems , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[26]  Donald F. Towsley,et al.  Resisting structural re-identification in anonymized social networks , 2010, The VLDB Journal.

[27]  Philip S. Yu,et al.  A Condensation Approach to Privacy Preserving Data Mining , 2004, EDBT.

[28]  Hye-Young Paik,et al.  Formal consistency verification between BPEL process and privacy policy , 2006, PST.

[29]  Mireille Hildebrandt,et al.  Profile Transparency by Design? Re-enabling Double Contingency , 2013 .