A Novel Method for Micro-Aggregation in Secure Statistical Databases Using Association and Interaction

We consider the problem of micro-aggregation in secure statistical databases, by enhancing the primitive Micro-Aggregation Technique (MAT), which incorporates proximity information. The state-of-the-art MAT recursively reduces the size of the data set by excluding points which are farthest from the centroid, and those which are closest to these farthest points, while it ignores the mutual Interaction between the records. In this paper, we argue that inter-record relationships can be quantified in terms of two entities, namely their "Association" and "Interaction". Based on the theoretically sound principles of the neural networks (NN), we believe that the proximity information can be quantified using the mutual Association, and their mutual Interaction can be quantified by invoking transitive-closure like operations on the latter. By repeatedly invoking the inter-record Associations and Interactions, the records are grouped into sizes of cardinality "k", where k is the security parameter in the algorithm. Our experimental results, which are done on artificial data and on the benchmark data sets for real-life data, demonstrate that the newly proposed method is superior to the state-of-the-art by as much as 13%.

[1]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[2]  Vicenç Torra,et al.  Microaggregation for Categorical Variables: A Median Based Approach , 2004, Privacy in Statistical Databases.

[3]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[5]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[6]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[7]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[8]  Sushil Jajodia,et al.  A Privacy-Enhanced Microaggregation Method , 2002, FoIKS.

[9]  Josep Domingo Ferrer Statistical Disclosure Control in Catalonia and the CRISES Group , 2002 .

[10]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[11]  Josep Domingo-Ferrer,et al.  Fuzzy Microaggregation for Microdata Protection , 2003, J. Adv. Comput. Intell. Intell. Informatics.

[12]  Josep Domingo-Ferrer,et al.  Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata , 2005, Data Mining and Knowledge Discovery.

[13]  B. John Oommen,et al.  A Fixed Structure Learning Automaton Micro-aggregation Technique for Secure Statistical Databases , 2006, Privacy in Statistical Databases.

[14]  V. Torra,et al.  Aggregation techniques for statistical confidentiality , 2002 .

[15]  Hans Weigand,et al.  An XML-Enabled Association Rule Framework , 2003, DEXA.

[16]  Josep Domingo-Ferrer,et al.  On the Security of Microaggregation with Individual Ranking: Analytical Attacks , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[17]  Joseph Y. Lo,et al.  Self-organizing map for cluster analysis of a breast cancer database , 2003, Artif. Intell. Medicine.

[18]  A. Solanas,et al.  Multivariate Microaggregation Based Genetic Algorithms , 2006, 2006 3rd International IEEE Conference Intelligent Systems.

[19]  Kazuyuki Aihara,et al.  Associative Dynamics in a Chaotic Neural Network , 1997, Neural Networks.

[20]  Chen Lihui,et al.  Clustering gene data via Associative Clustering Neural Network , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[21]  Josep Domingo-Ferrer,et al.  Towards Fuzzy c-means Based Microaggregation , 2002 .

[22]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[23]  Sumitra Mukherjee,et al.  A Polynomial Algorithm for Optimal Univariate Microaggregation , 2003, IEEE Trans. Knowl. Data Eng..

[24]  Josep Domingo-Ferrer,et al.  Optimal Multivariate 2-Microaggregation for Microdata Protection: A 2-Approximation , 2006, Privacy in Statistical Databases.

[25]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[26]  B. John Oommen,et al.  On Optimizing the k-Ward Micro-aggregation Technique for Secure Statistical Databases , 2006, ACISP.

[27]  Fernando Bação,et al.  Self-organizing Maps as Substitutes for K-Means Clustering , 2005, International Conference on Computational Science.

[28]  YUHUI YAO,et al.  Associative Clustering for Clusters of Arbitrary Distribution Shapes , 2001, Neural Processing Letters.