Measuring Anonymity of Pseudonymized Data After Probabilistic Background Attacks

There is clear demand among organizations for sharing their data for mining and other purposes without compromising the privacy of individual objects contained in the data. Pseudonymization is a simple, yet widely employed technique for sanitizing such data prior to its release; it replaces identifying names in the data by pseudonyms. Well-known metrics already exist in the literature for measuring the amount of anonymity still contained in some pseudonymized data in the aftermath of an infeasibility background attack. While the need for a metric for the much wider and more realistic class of probabilistic background attacks has also been well identified, currently no such metric exists. We fulfill that long identified need by presenting two metrics, an approximate and a more exact one, for measuring anonymity in pseudonymized data in the wake of a probabilistic attack. These metrics are rather intractable, thus impractical to employ in real-life situations. Therefore, we also develop an efficient heuristic for our superior metric, and show the remarkable accuracy of our heuristic. Our metrics and heuristic assist a data owner in evaluating the safety level of pseudonymized data against probabilistic attacks before making a decision on its release.

[1]  Lisa Werner,et al.  Principles of forecasting: A handbook for researchers and practitioners , 2002 .

[2]  G. Egorychev The solution of van der Waerden's problem for permanents , 1981 .

[3]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[4]  Oded Goldreich,et al.  Computer Security and Cryptography , 1997 .

[5]  Laks V. S. Lakshmanan,et al.  Privacy-Preserving Mining of Association Rules From Outsourced Transaction Databases , 2013, IEEE Systems Journal.

[6]  Ling Liu,et al.  PrIvacy Risks And Countermeasures In Publishing And Mining Social Network Data , 2011, 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[7]  Vitaly Shmatikov,et al.  Myths and fallacies of "Personally Identifiable Information" , 2010, Commun. ACM.

[8]  Shouling Ji,et al.  Structural Data De-anonymization: Quantification, Practice, and Implications , 2014, CCS.

[9]  Rob J Hyndman,et al.  Another look at measures of forecast accuracy , 2006 .

[10]  T. Raghavan,et al.  Nonnegative Matrices and Applications , 1997 .

[11]  Moni Naor,et al.  On the complexity of differentially private data release: efficient algorithms and hardness results , 2009, STOC '09.

[12]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Laks V. S. Lakshmanan,et al.  On disclosure risk analysis of anonymized itemsets in the presence of prior knowledge , 2008, TKDD.

[14]  Richard Sinkhorn,et al.  Concerning nonnegative matrices and doubly stochastic matrices , 1967 .

[15]  Frank Stajano,et al.  Mix zones: user privacy in location-aware services , 2004, IEEE Annual Conference on Pervasive Computing and Communications Workshops, 2004. Proceedings of the Second.

[16]  Jianfeng Ma,et al.  TrPF: A Trajectory Privacy-Preserving Framework for Participatory Sensing , 2013, IEEE Transactions on Information Forensics and Security.

[17]  Florian Kerschbaum Distance-preserving pseudonymization for timestamps and spatial data , 2007, WPES '07.

[18]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[19]  Giacomo Verticale,et al.  A data pseudonymization protocol for Smart Grids , 2012, 2012 IEEE Online Conference on Green Communications (GreenCom).

[20]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Bart Preneel,et al.  Towards Measuring Anonymity , 2002, Privacy Enhancing Technologies.

[22]  J. N. Kapur Maximum-entropy models in science and engineering , 1992 .

[23]  Xiaofeng Meng,et al.  Short-Term Wind Power Forecasting Using Gaussian Processes , 2013, IJCAI.

[24]  Carmela Troncoso,et al.  Vida: How to Use Bayesian Inference to De-anonymize Persistent Communications , 2009, Privacy Enhancing Technologies.

[25]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[26]  Jean-Yves Le Boudec,et al.  Quantifying Location Privacy , 2011, 2011 IEEE Symposium on Security and Privacy.

[27]  Laks V. S. Lakshmanan,et al.  To do or not to do: the dilemma of disclosing anonymized data , 2005, SIGMOD '05.

[28]  Benjamin C. M. Fung,et al.  Anonymizing healthcare data: a case study on the blood transfusion service , 2009, KDD.

[29]  Bin Tang,et al.  Effectiveness of Probabilistic Attacks on Anonymity of Users Communicating via Multiple Messages , 2013, IEEE Systems Journal.

[30]  Carmela Troncoso,et al.  Revisiting a combinatorial approach toward measuring anonymity , 2008, WPES '08.

[31]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[32]  Leslie G. Valiant,et al.  The Complexity of Computing the Permanent , 1979, Theor. Comput. Sci..

[33]  Khaled El Emam,et al.  Practicing Differential Privacy in Health Care: A Review , 2013, Trans. Data Priv..

[34]  Ken Mano,et al.  Privacy-Preserving Publishing of Pseudonym-Based Trajectory Location Data Set , 2013, 2013 International Conference on Availability, Reliability and Security.

[35]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[36]  Mark Jerrum,et al.  A mildly exponential approximation algorithm for the permanent , 2005, Algorithmica.

[37]  Steve Chien,et al.  Clifford algebras and approximating the permanent , 2002, STOC '02.

[38]  George Danezis,et al.  Towards an Information Theoretic Metric for Anonymity , 2002, Privacy Enhancing Technologies.

[39]  L. Khachiyan,et al.  ON THE COMPLEXITY OF NONNEGATIVE-MATRIX SCALING , 1996 .

[40]  Philip S. Yu,et al.  Introduction to Privacy-Preserving Data Publishing: Concepts and Techniques , 2010 .

[41]  J. Scott Armstrong,et al.  Principles of forecasting , 2001 .

[42]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[43]  R. Häggkvist,et al.  Bipartite graphs and their applications , 1998 .

[44]  Rong Li,et al.  An Accurate System-Wide Anonymity Metric for Probabilistic Attacks , 2011, PETS.

[45]  Eric Vigoda,et al.  A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries , 2004, JACM.

[46]  A. Pfitzmann,et al.  A terminology for talking about privacy by data minimization: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management , 2010 .

[47]  Alex Samorodnitsky,et al.  A Deterministic Strongly Polynomial Algorithm for Matrix Scaling and Approximate Permanents , 1998, STOC '98.

[48]  Chris Clifton,et al.  On syntactic anonymity and differential privacy , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[49]  M. Shcherbakov,et al.  A Survey of Forecast Error Measures , 2013 .

[50]  Fikret Sivrikaya,et al.  A Combinatorial Approach to Measuring Anonymity , 2007, 2007 IEEE Intelligence and Security Informatics.