New Multi-dimensional Sorting Based K-Anonymity Microaggregation for Statistical Disclosure Control

In recent years, there has been an alarming increase of online identity theft and attacks using personally identifiable information. The goal of privacy preservation is to de-associate individuals from sensitive or microdata information. Microaggregation techniques seeks to protect microdata in such a way that can be published and mined without providing any private information that can be linked to specific individuals. Microaggregation works by partitioning the microdata into groups of at least k records and then replacing the records in each group with the centroid of the group. An optimal microaggregation method must minimize the information loss resulting from this replacement process. The challenge is how to minimize the information loss during the microaggregation process. This paper presents a new microaggregation technique for Statistical Disclosure Control (SDC). It consists of two stages. In the first stage, the algorithm sorts all the records in the data set in a particular way to ensure that during microaggregation very dissimilar observations are never entered into the same cluster. In the second stage an optimal microaggregation method is used to create k-anonymous clusters while minimizing the information loss. It works by taking the sorted data and simultaneously creating two distant clusters using the two extreme sorted values as seeds for the clusters. The performance of the proposed technique is compared against the most recent microaggregation methods. Experimental results using benchmark datasets show that the proposed algorithm has the lowest information loss compared with a basket of techniques in the literature.

[1]  Pei-Chann Chang,et al.  Density-based microaggregation for statistical disclosure control , 2010, Expert Syst. Appl..

[2]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[3]  Sumitra Mukherjee,et al.  A Polynomial Algorithm for Optimal Univariate Microaggregation , 2003, IEEE Trans. Knowl. Data Eng..

[4]  Agusti Solanas,et al.  Privacy Protection with Genetic Algorithms , 2008 .

[5]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[6]  Olgierd Hryniewicz,et al.  Soft methods in probability, statistics and data analysis , 2002 .

[7]  Lam Thu Bui,et al.  Success in Evolutionary Computation , 2008 .

[8]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[9]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[10]  Chin-Chen Chang,et al.  TFRP: An efficient microaggregation algorithm for statistical disclosure control , 2007, J. Syst. Softw..

[11]  Elisa Bertino,et al.  Systematic clustering method for l-diversity model , 2010, ADC.

[12]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[13]  Josep Domingo-Ferrer,et al.  Privacy in Data Mining , 2005, Data Mining and Knowledge Discovery.

[14]  Yanchun Zhang,et al.  Effective Collaboration with Information Sharing in Virtual Universities , 2009, IEEE Transactions on Knowledge and Data Engineering.

[15]  J. Domingo-Ferrer,et al.  Extending microaggregation procedures using defuzzification methods for categorical variables , 2002, Proceedings First International IEEE Symposium Intelligent Systems.

[16]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[17]  Vicenç Torra,et al.  Microaggregation for Categorical Variables: A Median Based Approach , 2004, Privacy in Statistical Databases.

[18]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[19]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[20]  Josep Domingo-Ferrer,et al.  Towards Fuzzy c-means Based Microaggregation , 2002 .

[21]  Rajkumar Roy,et al.  Advances in Soft Computing , 2018, Lecture Notes in Computer Science.

[22]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[23]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[24]  Thomas Steinke,et al.  ZIB Structure Prediction Pipeline: Composing a Complex Biological Workflow Through Web Services , 2006, Euro-Par.

[25]  Yanchun Zhang,et al.  A Pairwise-Systematic Microaggregation for Statistical Disclosure Control , 2010, 2010 IEEE International Conference on Data Mining.

[26]  Josep Domingo-Ferrer,et al.  A polynomial-time approximation to optimal multivariate microaggregation , 2008, Comput. Math. Appl..

[27]  A. Solanas,et al.  V-MDAV : A Multivariate Microaggregation With Variable Group Size , 2006 .

[28]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[29]  Huiqun Yu,et al.  A multivariate Immune Clonal Selection Microaggregation Algorithm , 2008, 2008 IEEE International Conference on Granular Computing.

[30]  Md. Enamul Kabir,et al.  Systematic Clustering-Based Microaggregation for Statistical Disclosure Control , 2010, 2010 Fourth International Conference on Network and System Security.

[31]  Josep Domingo-Ferrer,et al.  Fuzzy Microaggregation for Microdata Protection , 2003, J. Adv. Comput. Intell. Intell. Informatics.

[32]  Md. Enamul Kabir,et al.  Microdata Protection Method Through Microaggregation: A Median-Based Approach , 2011, Inf. Secur. J. A Glob. Perspect..

[33]  Norbert Meyer,et al.  Euro-Par 2006 Workshops: Parallel Processing , 2007, Lecture Notes in Computer Science.