Microdata Protection Method Through Microaggregation: A Systematic Approach

Microdata protection in statistical databases has recently become a major societal concern and has been intensively studied in recent years. Statistical Disclosure Control (SDC) is often applied to statistical databases before they are released for public use. Microaggregation for SDC is a family of methods to protect microdata from individual identification. SDC seeks to protect microdata in such a way that can be published and mined without providing any private information that can be linked to specific individuals. Microaggregation works by partitioning the microdata into groups of at least k records and then replacing the records in each group with the centroid of the group. This paper presents a clustering-based microaggregation method to minimize the information loss. The proposed technique adopts to group similar records together in a systematic way and then anonymized with the centroid of each group individually. The structure of systematic clustering problem is defined and investigated and an algorithm of the proposed problem is developed. Experimental results show that our method attains a reasonable dominance with respect to both information loss and execution time than the most popular heuristic algorithm called Maximum Distance to Average Vector (MDAV).

[1]  Agusti Solanas,et al.  Privacy Protection with Genetic Algorithms , 2008 .

[2]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[3]  Yanchun Zhang,et al.  A Pairwise-Systematic Microaggregation for Statistical Disclosure Control , 2010, 2010 IEEE International Conference on Data Mining.

[4]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[5]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[6]  Josep Domingo-Ferrer,et al.  Privacy in Data Mining , 2005, Data Mining and Knowledge Discovery.

[7]  Josep Domingo-Ferrer,et al.  Fuzzy Microaggregation for Microdata Protection , 2003, J. Adv. Comput. Intell. Intell. Informatics.

[8]  Md. Enamul Kabir,et al.  Microdata Protection Method Through Microaggregation: A Median-Based Approach , 2011, Inf. Secur. J. A Glob. Perspect..

[9]  A. Solanas,et al.  Multivariate Microaggregation Based Genetic Algorithms , 2006, 2006 3rd International IEEE Conference Intelligent Systems.

[10]  Md. Enamul Kabir,et al.  Systematic Clustering-Based Microaggregation for Statistical Disclosure Control , 2010, 2010 Fourth International Conference on Network and System Security.

[11]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[12]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[14]  Josep Domingo-Ferrer,et al.  A Genetic Approach to Multivariate Microaggregation for Database Privacy , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[15]  Chin-Chen Chang,et al.  TFRP: An efficient microaggregation algorithm for statistical disclosure control , 2007, J. Syst. Softw..

[16]  Pei-Chann Chang,et al.  Density-based microaggregation for statistical disclosure control , 2010, Expert Syst. Appl..

[17]  Josep Domingo-Ferrer,et al.  A polynomial-time approximation to optimal multivariate microaggregation , 2008, Comput. Math. Appl..

[18]  A. Solanas,et al.  V-MDAV : A Multivariate Microaggregation With Variable Group Size , 2006 .

[19]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[20]  Huiqun Yu,et al.  A multivariate Immune Clonal Selection Microaggregation Algorithm , 2008, 2008 IEEE International Conference on Granular Computing.

[21]  J. Domingo-Ferrer,et al.  Extending microaggregation procedures using defuzzification methods for categorical variables , 2002, Proceedings First International IEEE Symposium Intelligent Systems.

[22]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[23]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[24]  Sumitra Mukherjee,et al.  A Polynomial Algorithm for Optimal Univariate Microaggregation , 2003, IEEE Trans. Knowl. Data Eng..

[25]  Elisa Bertino,et al.  Efficient systematic clustering method for k-anonymization , 2011, Acta Informatica.

[26]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[27]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[28]  Vicenç Torra,et al.  Microaggregation for Categorical Variables: A Median Based Approach , 2004, Privacy in Statistical Databases.

[29]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[30]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[31]  Josep Domingo-Ferrer,et al.  Towards Fuzzy c-means Based Microaggregation , 2002 .