Fine granular proximity breach prevention during numerical data anonymization

Microaggregation is known as a successful perturbative mechanism to realize k-anonymity. The method partitions the dataset into groups of at least k members and then aggregates the group members. These aggregated values are published instead of the original ones. In conventional microaggregation methods, it is desired to produce a protected dataset similar to the original one, so close data records are grouped into the same cluster. Accordingly, the aggregation phase of the algorithms are designed to minimize the sum of within-group squared error (SSE), and therefore a simple arithmetic mean in each group is utilized within the aggregation phase to compute the centroids. However, this trivial approach does not consider the proximity of the published values to the original ones, so intruders are able to limit the range of the original values with respect to published data. In this paper, a proximity-aware microaggregation post-processing algorithm is proposed that revisits the aggregation step to remedy this deficiency. Additionally, it is possible to consider different levels of minimum required distances between original record values and their corresponding published ones. Empirical results confirm the superiority of the proposed method in achieving a better tradeoff point between disclosure risk and information loss in comparison with similar microaggregation techniques.

[1]  Calton Pu,et al.  A General Proximity Privacy Principle , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[2]  Leena Jain,et al.  Traveling Salesman Problem: A Case Study , 2012, BIOINFORMATICS 2012.

[3]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[4]  U. Rovira,et al.  Chapter 6 A Quantitative Comparison of Disclosure Control Methods for Microdata , 2001 .

[5]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[6]  Yufei Tao,et al.  Preservation of proximity privacy in publishing numerical sensitive data , 2008, SIGMOD Conference.

[7]  Reihaneh Safavi-Naini,et al.  An information theoretic privacy and utility measure for data sanitization mechanisms , 2012, CODASPY '12.

[8]  Josep Domingo-Ferrer,et al.  Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata , 2005, Data Mining and Knowledge Discovery.

[9]  Saeed Jalili,et al.  Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimization , 2015, Data Mining and Knowledge Discovery.

[10]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Vicenç Torra,et al.  On the Comparison of Generic Information Loss Measures and Cluster-Specific Ones , 2008, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[12]  Josep Domingo-Ferrer,et al.  Measuring risk and utility of anonymized data using information theory , 2009, EDBT/ICDT '09.

[13]  William E. Winkler,et al.  Disclosure Risk Assessment in Perturbative Microdata Protection , 2002, Inference Control in Statistical Databases.

[14]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[15]  B. John Oommen,et al.  A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases , 2010 .

[16]  Javier Herranz,et al.  Kd-trees and the real disclosure risks of large statistical databases , 2012, Inf. Fusion.

[17]  Saeed Jalili,et al.  Fast data-oriented microaggregation algorithm for large numerical datasets , 2014, Knowl. Based Syst..

[18]  Sushil Jajodia,et al.  A Privacy-Enhanced Microaggregation Method , 2002, FoIKS.

[19]  Jordi Nin,et al.  Efficient microaggregation techniques for large numerical data volumes , 2012, International Journal of Information Security.

[20]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control , 2012 .

[21]  Vicenç Torra,et al.  Information fusion in data privacy: A survey , 2012, Inf. Fusion.

[22]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[23]  V. Torra,et al.  Comparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure Risk , 2004 .

[24]  Javier Herranz,et al.  On the disclosure risk of multivariate microaggregation , 2008, Data Knowl. Eng..

[25]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[26]  Sumitra Mukherjee,et al.  A Polynomial Algorithm for Optimal Univariate Microaggregation , 2003, IEEE Trans. Knowl. Data Eng..

[27]  Jim Burridge,et al.  Information preserving statistical obfuscation , 2003, Stat. Comput..

[28]  Ruth Brand,et al.  Microdata Protection through Noise Addition , 2002, Inference Control in Statistical Databases.

[29]  Stan Matwin,et al.  Classifying data from protected statistical datasets , 2010, Comput. Secur..

[30]  Josep Domingo-Ferrer,et al.  Privacy in Statistical Databases: k-Anonymity Through Microaggregation , 2006, 2006 IEEE International Conference on Granular Computing.

[31]  Brook Heaton New Record Ordering Heuristics for Multivariate Microaggregation. , 2012 .

[32]  Saeed Jalili,et al.  Preference-based anonymization of numerical datasets by multi-objective microaggregation , 2015, Inf. Fusion.

[33]  Anna Oganian,et al.  Combinations of SDC Methods for Microdata Protection , 2006, Privacy in Statistical Databases.

[34]  Agusti Solanas,et al.  Privacy Protection with Genetic Algorithms , 2008 .

[35]  Josep Domingo-Ferrer,et al.  Outlier Protection in Continuous Microdata Masking , 2004, Privacy in Statistical Databases.

[36]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[37]  Javier Herranz,et al.  More Hybrid and Secure Protection of Statistical Data Sets , 2012, IEEE Transactions on Dependable and Secure Computing.

[38]  Pei-Chann Chang,et al.  Comparison of microaggregation approaches on anonymized data quality , 2010, Expert Syst. Appl..

[39]  William E. Winkler,et al.  Re-identification Methods for Masked Microdata , 2004, Privacy in Statistical Databases.