Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimization

Microaggregation is a masking mechanism to protect confidential data in a public release. This technique can produce a k-anonymous dataset where data records are partitioned into groups of at least k members. In each group, a representative centroid is computed by aggregating the group members and is published instead of the original records. In a conventional microaggregation algorithm, the centroids are computed based on simple arithmetic mean of group members. This naïve formulation does not consider the proximity of the published values to the original ones, so an intruder may be able to guess the original values. This paper proposes a disclosure-aware aggregation model, where published values are computed in a given distance from the original ones to attain a more protected and useful published dataset. Empirical results show the superiority of the proposed method in achieving a better trade-off point between disclosure risk and information loss in comparison with other similar anonymization techniques.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Stefan Gottschalk,et al.  Privacy Preserving Data Mining Models And Algorithms , 2016 .

[3]  Vicenç Torra,et al.  Fuzzy c-means for Fuzzy Hierarchical Clustering , 2005, The 14th IEEE International Conference on Fuzzy Systems, 2005. FUZZ '05..

[4]  Josep Domingo-Ferrer,et al.  Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata , 2005, Data Mining and Knowledge Discovery.

[5]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[6]  Arne Stolbjerg Drud,et al.  CONOPT - A Large-Scale GRG Code , 1994, INFORMS J. Comput..

[7]  Josep Domingo-Ferrer,et al.  An Anonymity Model Achievable Via Microaggregation , 2008, Secure Data Management.

[8]  Jim Burridge,et al.  Information preserving statistical obfuscation , 2003, Stat. Comput..

[9]  Guillermo Navarro-Arribas,et al.  Towards microaggregation of log files for Web usage mining in B2C e-commerce , 2009, NAFIPS 2009 - 2009 Annual Meeting of the North American Fuzzy Information Processing Society.

[10]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[11]  Ruth Brand,et al.  Microdata Protection through Noise Addition , 2002, Inference Control in Statistical Databases.

[12]  William E. Winkler,et al.  Re-identification Methods for Masked Microdata , 2004, Privacy in Statistical Databases.

[13]  Brook Heaton New Record Ordering Heuristics for Multivariate Microaggregation. , 2012 .

[14]  Jordi Nin,et al.  Efficient microaggregation techniques for large numerical data volumes , 2012, International Journal of Information Security.

[15]  Saeed Jalili,et al.  Fast data-oriented microaggregation algorithm for large numerical datasets , 2014, Knowl. Based Syst..

[16]  Jay-J. Kim A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND , 2002 .

[17]  Stan Matwin,et al.  Classifying data from protected statistical datasets , 2010, Comput. Secur..

[18]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[19]  B. John Oommen,et al.  A survey on statistical disclosure control and micro-aggregation techniques for secure statistical databases , 2010 .

[20]  Sushil Jajodia,et al.  A Privacy-Enhanced Microaggregation Method , 2002, FoIKS.

[21]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control , 2012 .

[22]  Sumitra Mukherjee,et al.  A Polynomial Algorithm for Optimal Univariate Microaggregation , 2003, IEEE Trans. Knowl. Data Eng..

[23]  Josep Domingo-Ferrer,et al.  Measuring risk and utility of anonymized data using information theory , 2009, EDBT/ICDT '09.

[24]  Vicenç Torra,et al.  Information fusion in data privacy: A survey , 2012, Inf. Fusion.

[25]  V. Torra,et al.  Comparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure Risk , 2004 .

[26]  Javier Herranz,et al.  On the disclosure risk of multivariate microaggregation , 2008, Data Knowl. Eng..

[27]  Montserrat Batet,et al.  Utility preserving query log anonymization via semantic microaggregation , 2013, Inf. Sci..

[28]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[29]  Saeed Jalili,et al.  Multivariate microaggregation by iterative optimization , 2013, Applied Intelligence.

[30]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[31]  Josep Domingo-Ferrer,et al.  Privacy in Statistical Databases: k-Anonymity Through Microaggregation , 2006, 2006 IEEE International Conference on Granular Computing.

[32]  P. Doyle,et al.  Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies , 2001 .

[33]  Javier Herranz,et al.  More Hybrid and Secure Protection of Statistical Data Sets , 2012, IEEE Transactions on Dependable and Secure Computing.

[34]  Pei-Chann Chang,et al.  Comparison of microaggregation approaches on anonymized data quality , 2010, Expert Syst. Appl..

[35]  Anna Oganian,et al.  Combinations of SDC Methods for Microdata Protection , 2006, Privacy in Statistical Databases.

[36]  Agusti Solanas,et al.  Privacy Protection with Genetic Algorithms , 2008 .

[37]  B. John Oommen,et al.  A survey on statistical disclosure control and micro‐aggregation techniques for secure statistical databases , 2010, Softw. Pract. Exp..

[38]  Josep Domingo-Ferrer,et al.  Statistical Disclosure Control: Hundepool/Statistical Disclosure Control , 2012 .

[39]  H. Küchenhoff,et al.  Estimation of a linear regression under microaggregation with the response variable as a sorting variable , 2007 .

[40]  Saeed Jalili,et al.  Preference-based anonymization of numerical datasets by multi-objective microaggregation , 2015, Inf. Fusion.

[41]  Reihaneh Safavi-Naini,et al.  An information theoretic privacy and utility measure for data sanitization mechanisms , 2012, CODASPY '12.

[42]  William E. Winkler,et al.  Disclosure Risk Assessment in Perturbative Microdata Protection , 2002, Inference Control in Statistical Databases.

[43]  Josep Domingo-Ferrer,et al.  Outlier Protection in Continuous Microdata Masking , 2004, Privacy in Statistical Databases.

[44]  Alberto López The effect of microaggregation on regression results: an application to Spanish innovation data , 2011 .

[45]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[46]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[47]  Javier Herranz,et al.  Kd-trees and the real disclosure risks of large statistical databases , 2012, Inf. Fusion.

[48]  Josep Domingo-Ferrer,et al.  Micro-aggregation-based heuristics for p-sensitive k-anonymity: one step beyond , 2008, PAIS '08.

[49]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[50]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.