Efficient Near-Optimal Variable-Size Microaggregation

Microaggregation is a well-known family of statistical disclosure control methods, that can also be used to achieve the k-anonymity privacy model and some of its extensions. Microaggregation can be viewed as a clustering problem where clusters must include at least k elements. In this paper, we present a new microaggregation heuristic based on Lloyd’s clustering algorithm that causes much less information loss than the other microaggregation heuristics in the literature. Our empirical work consistently observes this superior performance for all minimum cluster sizes k and data sets tried.

[1]  Pei-Chann Chang,et al.  Density-based microaggregation for statistical disclosure control , 2010, Expert Syst. Appl..

[2]  J. Domingo-Ferrer,et al.  Steered Microaggregation: A Unified Primitive for Anonymization of Data Sets and Data Streams , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[3]  Josep Domingo-Ferrer,et al.  Differentially private data publishing via optimal univariate microaggregation and record perturbation , 2018, Knowl. Based Syst..

[4]  Josep Domingo-Ferrer,et al.  A polynomial-time approximation to optimal multivariate microaggregation , 2008, Comput. Math. Appl..

[5]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[6]  Josep Domingo-Ferrer,et al.  Enhancing data utility in differential privacy via microaggregation-based k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{docume , 2014, The VLDB Journal.

[7]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[8]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[9]  Chin-Chen Chang,et al.  TFRP: An efficient microaggregation algorithm for statistical disclosure control , 2007, J. Syst. Softw..

[10]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Saeed Jalili,et al.  Multivariate microaggregation by iterative optimization , 2013, Applied Intelligence.

[12]  Josep Domingo-Ferrer,et al.  Anonymization of nominal data based on semantic marginality , 2013, Inf. Sci..

[13]  Sumitra Mukherjee,et al.  A Polynomial Algorithm for Optimal Univariate Microaggregation , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Josep Domingo-Ferrer,et al.  t-closeness through microaggregation: Strict privacy with enhanced utility preservation , 2016, ICDE.

[15]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[16]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[17]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.