A Pairwise-Systematic Microaggregation for Statistical Disclosure Control

Microdata protection in statistical databases has recently become a major societal concern and has been intensively studied in recent years. Statistical Disclosure Control (SDC) is often applied to statistical databases before they are released for public use. Micro aggregation for SDC is a family of methods to protect micro data from individual identification. SDC seeks to protect micro data in such a way that can be published and mined without providing any private information that can be linked to specific individuals. Micro aggregation works by partitioning the micro data into groups of at least k records and then replacing the records in each group with the centroid of the group. An optimal micro aggregation method must minimize the information loss resulting from this replacement process. The challenge is how to minimize the information loss during the micro aggregation process. This paper presents a pair wise systematic (P-S) micro aggregation method to minimize the information loss. The proposed technique simultaneously forms two distant groups at a time with the corresponding similar records together in a systematic way and then anonymized with the centroid of each group individually. The structure of P-S problem is defined and investigated and an algorithm of the proposed problem is developed. The performance of the P-S algorithm is compared against the most recent micro aggregation methods. Experimental results show that P-S algorithm incurs less than half information loss than the latest micro aggregation methods for all of the test situations.

[1]  Md. Enamul Kabir,et al.  Microdata Protection Method Through Microaggregation: A Median-Based Approach , 2011, Inf. Secur. J. A Glob. Perspect..

[2]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  Elisa Bertino,et al.  Systematic clustering method for l-diversity model , 2010, ADC.

[4]  Josep Domingo-Ferrer,et al.  Fuzzy Microaggregation for Microdata Protection , 2003, J. Adv. Comput. Intell. Intell. Informatics.

[5]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[6]  Josep Domingo-Ferrer,et al.  Privacy in Data Mining , 2005, Data Mining and Knowledge Discovery.

[7]  Yanchun Zhang,et al.  Effective Collaboration with Information Sharing in Virtual Universities , 2009, IEEE Transactions on Knowledge and Data Engineering.

[8]  Josep Domingo-Ferrer,et al.  Towards Fuzzy c-means Based Microaggregation , 2002 .

[9]  Md. Enamul Kabir,et al.  Systematic Clustering-Based Microaggregation for Statistical Disclosure Control , 2010, 2010 Fourth International Conference on Network and System Security.

[10]  Agusti Solanas,et al.  Privacy Protection with Genetic Algorithms , 2008 .

[11]  Chin-Chen Chang,et al.  TFRP: An efficient microaggregation algorithm for statistical disclosure control , 2007, J. Syst. Softw..

[12]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[13]  Vicenç Torra,et al.  Microaggregation for Categorical Variables: A Median Based Approach , 2004, Privacy in Statistical Databases.

[14]  Pei-Chann Chang,et al.  Density-based microaggregation for statistical disclosure control , 2010, Expert Syst. Appl..

[15]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[16]  Sumitra Mukherjee,et al.  A Polynomial Algorithm for Optimal Univariate Microaggregation , 2003, IEEE Trans. Knowl. Data Eng..

[17]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[18]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[19]  Hua Wang,et al.  Extended k-anonymity models against sensitive attribute disclosure , 2011, Comput. Commun..

[20]  Josep Domingo-Ferrer,et al.  A polynomial-time approximation to optimal multivariate microaggregation , 2008, Comput. Math. Appl..

[21]  Hua Wang,et al.  Extended K-Anonymity Models Against Attribute Disclosure , 2009, 2009 Third International Conference on Network and System Security.

[22]  A. Solanas,et al.  V-MDAV : A Multivariate Microaggregation With Variable Group Size , 2006 .

[23]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[24]  Huiqun Yu,et al.  A multivariate Immune Clonal Selection Microaggregation Algorithm , 2008, 2008 IEEE International Conference on Granular Computing.

[25]  J. Domingo-Ferrer,et al.  Extending microaggregation procedures using defuzzification methods for categorical variables , 2002, Proceedings First International IEEE Symposium Intelligent Systems.

[26]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[27]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[28]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[29]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.