Practice and Challenges of (De-)Anonymisation for Data Sharing

Personal data is a necessity in many fields for research and innovation purposes, and when such data is shared, the data controller carries the responsibility of protecting the privacy of the individuals contained in their dataset. The removal of direct identifiers, such as full name and address, is not enough to secure the privacy of individuals as shown by de-anonymisation methods in the scientific literature. Data controllers need to become aware of the risks of de-anonymisation and apply the appropriate anonymisation measures before sharing their datasets, in order to comply with privacy regulations. To address this need, we defined a procedure that makes data controllers aware of the de-anonymisation risks and helps them in deciding the anonymisation measures that need to be taken in order to comply with the General Data Protection Regulation (GDPR). We showcase this procedure with a customer relationship management (CRM) dataset provided by a telecommunications provider. Finally, we recount the challenges we identified during the definition of this procedure and by putting existing knowledge and tools into practice.

[1]  Prateek Mittal,et al.  Graph Data Anonymization, De-Anonymization Attacks, and De-Anonymizability Quantification: A Survey , 2017, IEEE Communications Surveys & Tutorials.

[2]  Panos Kalnis,et al.  On the Anonymization of Sparse High-Dimensional Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[3]  Ninghui Li,et al.  On the tradeoff between privacy and utility in data publishing , 2009, KDD.

[4]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[5]  Fabian Prasser,et al.  Putting Statistical Disclosure Control into Practice: The ARX Data Anonymization Tool , 2015, Medical Data Privacy Handbook.

[6]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[7]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[8]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[9]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[10]  L. Sweeney Simple Demographics Often Identify People Uniquely , 2000 .

[11]  Mihai Lupu,et al.  PrioPrivacy: A Local Recoding K-Anonymity Tool for Prioritised Quasi-Identifiers , 2019, WI.