PRUDEnce: a System for Assessing Privacy Risk vs Utility in Data Sharing Ecosystems

Data describing human activities are an important source of knowledge useful for understanding individual and collective behavior and for developing a wide range of user services. Unfortunately, this kind of data is sensitive, because people’s whereabouts may allow re-identification of individuals in a de-identified database. Therefore, Data Providers, before sharing those data, must apply any sort of anonymization to lower the privacy risks, but they must be aware and capable of controlling also the data quality, since these two factors are often a trade-off. In this paper we propose PRUDEnce (Privacy Risk versus Utility in Data sharing Ecosystems), a system enabling a privacy-aware ecosystem for sharing personal data. It is based on a methodology for assessing both the empirical (not theoretical) privacy risk associated to users represented in the data, and the data quality guaranteed only with users not at risk. Our proposal is able to support the Data Provider in the exploration of a repertoire of possible data transformations with the aim of selecting one specific transformation that yields an adequate trade-off between data quality and privacy risk. We study the practical effectiveness of our proposal over three data formats underlying many services, defined on real mobility data, i.e., presence data, trajectory data and road segment data.

[1]  Raymond Heatherly,et al.  A probabilistic approach to mitigate composition attacks on privacy in non-coordinated environments , 2014, Knowl. Based Syst..

[2]  Tim Schmitz,et al.  Improving Web Application Security Threats And Countermeasures , 2016 .

[3]  Farshad Fotouhi,et al.  Disclosure risk measures for the sampling disclosure control method , 2004, SAC '04.

[4]  Josep Domingo-Ferrer,et al.  Individual Differential Privacy: A Utility-Preserving Formulation of Differential Privacy Guarantees , 2016, IEEE Transactions on Information Forensics and Security.

[5]  V. Torra,et al.  Comparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure Risk , 2004 .

[6]  Donald F. Towsley,et al.  Resisting structural re-identification in anonymized social networks , 2008, The VLDB Journal.

[7]  Farshad Fotouhi,et al.  Disclosure risk measures for microdata , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[8]  Fabian Prasser,et al.  A Scalable and Pragmatic Method for the Safe Sharing of High-Quality Health Data , 2018, IEEE Journal of Biomedical and Health Informatics.

[9]  Hua Wang,et al.  Data Privacy against Composition Attack , 2012, DASFAA.

[10]  Vicen Torra,et al.  Data Privacy: Foundations, New Developments and the Big Data Challenge , 2017 .

[11]  John Ferro,et al.  Identifying individual vulnerability based on public data , 2013, 2013 Eleventh Annual Conference on Privacy, Security and Trust.

[12]  Francesca Pratesi,et al.  Privacy-by-design in big data analytics and social mining , 2014, EPJ Data Science.

[13]  Evimaria Terzi,et al.  A Framework for Computing the Privacy Scores of Users in Online Social Networks , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[14]  Sushil Jajodia,et al.  Checking for k-Anonymity Violation by Views , 2005, VLDB.

[15]  Sofya Raskhodnikova,et al.  Smooth sensitivity and sampling in private data analysis , 2007, STOC '07.

[16]  Khaled El Emam,et al.  A method for evaluating marketer re-identification risk , 2010, EDBT '10.

[17]  George T. Duncan,et al.  Disclosure Risk vs. Data Utility: The R-U Confidentiality Map , 2003 .

[18]  Anna Monreale,et al.  Mobility Data and Privacy , 2013, Mobility Data.

[19]  Michael F. Goodchild,et al.  Location-Based Services , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[20]  Nikos Mamoulis,et al.  Privacy Preservation in the Publication of Trajectories , 2008, The Ninth International Conference on Mobile Data Management (mdm 2008).

[21]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[22]  Frank Swiderski,et al.  Threat Modeling , 2018, Hacking Connected Cars.

[23]  Fabian Prasser,et al.  Lightning: Utility-Driven Anonymization of High-Dimensional Data , 2016, Trans. Data Priv..

[24]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[25]  R. Sarathy,et al.  Fool's Gold: an Illustrated Critique of Differential Privacy , 2013 .

[26]  Andreas Haeberlen,et al.  Differential Privacy Under Fire , 2011, USENIX Security Symposium.

[27]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[28]  Adam D. Smith,et al.  Composition attacks and auxiliary information in data privacy , 2008, KDD.

[29]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[30]  Francesco Bonchi,et al.  Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[31]  Yufei Tao,et al.  M-invariance: towards privacy preserving re-publication of dynamic datasets , 2007, SIGMOD '07.

[32]  Benjamin C. M. Fung,et al.  Differentially private transit data publication: a case study on the montreal transportation system , 2012, KDD.

[33]  Wouter Joosen,et al.  A privacy threat analysis framework: supporting the elicitation and fulfillment of privacy requirements , 2011, Requirements Engineering.

[34]  Laks V. S. Lakshmanan,et al.  Anonymizing moving objects: how to hide a MOB in a crowd? , 2009, EDBT '09.

[35]  Vicenç Torra,et al.  Choquet integral for record linkage , 2012, Ann. Oper. Res..

[36]  D. Lambert Measures of Disclosure Risks and Harm , 1993 .

[37]  Chris Clifton,et al.  How Much Is Enough? Choosing ε for Differential Privacy , 2011, ISC.

[38]  Slim Trabelsi,et al.  Data disclosure risk evaluation , 2009, 2009 Fourth International Conference on Risks and Security of Internet and Systems (CRiSIS 2009).

[39]  Anna Monreale,et al.  Movement data anonymity through generalization , 2009, SPRINGL '09.