On Utilizing Association and Interaction Concepts for Enhancing Microaggregation in Secure Statistical Databases

This paper presents a possibly pioneering endeavor to tackle the microaggregation techniques (MATs) in secure statistical databases by resorting to the principles of associative neural networks (NNs). The prior art has improved the available solutions to the MAT by incorporating proximity information, and this approach is done by recursively reducing the size of the data set by excluding points that are farthest from the centroid and points that are closest to these farthest points. Thus, although the method is extremely effective, arguably, it uses only the proximity information while ignoring the mutual interaction between the records. In this paper, we argue that interrecord relationships can be quantified in terms of the following two entities: 1) their ldquoassociationrdquo and 2) their ldquointeraction.rdquo This case means that records that are not necessarily close to each other may still be ldquogrouped,rdquo because their mutual interaction, which is quantified by invoking transitive-closure-like operations on the latter entity, could be significant, as suggested by the theoretically sound principles of NNs. By repeatedly invoking the interrecord associations and interactions, the records are grouped into sizes of cardinality ldquok,rdquo where k is the security parameter in the algorithm. Our experimental results, which are done on artificial data and benchmark real-life data sets, demonstrate that the newly proposed method is superior to the state of the art not only based on the information loss (IL) perspective but also when it concerns a criterion that involves a combination of the IL and the disclosure risk (DR).

[1]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[2]  Vicenç Torra,et al.  Microaggregation for Categorical Variables: A Median Based Approach , 2004, Privacy in Statistical Databases.

[3]  Grup Crises Microdata Disclosure Risk in Database Privacy Protection , 2004 .

[4]  U. Rovira,et al.  Chapter 6 A Quantitative Comparison of Disclosure Control Methods for Microdata , 2001 .

[5]  B. John Oommen,et al.  Achieving Microaggregation for Secure Statistical Databases Using Fixed-Structure Partitioning-Based Learning Automata , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Josep Domingo-Ferrer,et al.  On the Security of Microaggregation with Individual Ranking: Analytical Attacks , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[7]  Thomas Wende Different Grades of Statistical Disclosure Control Correlated with German Statistics Law , 2004, Privacy in Statistical Databases.

[8]  Josep Domingo Ferrer Statistical Disclosure Control in Catalonia and the CRISES Group , 2002 .

[9]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[10]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  Menno Cuppen,et al.  Source Data Perturbation in Statistical Disclosure Control , 2000 .

[12]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[13]  V. Torra,et al.  Aggregation techniques for statistical confidentiality , 2002 .

[14]  Ebaa Fayyoumi Novel micro-aggregation techniques for secure statistical databases , 2008 .

[15]  V. Torra,et al.  Comparing SDC Methods for Microdata on the Basis of Information Loss and Disclosure Risk , 2004 .

[16]  B. John Oommen,et al.  A Novel Method for Micro-Aggregation in Secure Statistical Databases Using Association and Interaction , 2007, ICICS.

[17]  V. Torra,et al.  Disclosure control methods and information loss for microdata , 2001 .

[18]  Fernando Bação,et al.  Self-organizing Maps as Substitutes for K-Means Clustering , 2005, International Conference on Computational Science.

[19]  Josep Domingo-Ferrer,et al.  Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata , 2005, Data Mining and Knowledge Discovery.

[20]  Sarah GIESSING,et al.  The CASC Project : Integrating Best Practice Methods for Statistical Confidentiality , 2001 .

[21]  John Panaretos,et al.  Aspects of Estimation Procedures at Eurostat with Some Emphasis on Over-Space Harmonisation , 2001 .

[22]  YUHUI YAO,et al.  Associative Clustering for Clusters of Arbitrary Distribution Shapes , 2001, Neural Processing Letters.

[23]  Josep Domingo-Ferrer,et al.  Outlier Protection in Continuous Microdata Masking , 2004, Privacy in Statistical Databases.

[24]  Kazuyuki Aihara,et al.  Associative Dynamics in a Chaotic Neural Network , 1997, Neural Networks.

[25]  Josep Domingo-Ferrer,et al.  Optimal Multivariate 2-Microaggregation for Microdata Protection: A 2-Approximation , 2006, Privacy in Statistical Databases.

[26]  Michael Cohen,et al.  Sensitive Micro Data Protection Using Latin Hypercube Sampling Technique , 2002, Inference Control in Statistical Databases.

[27]  Wiebren de Jonge,et al.  Compromising statistical databases responding to queries about means , 1983, TODS.

[28]  William E. Winkler,et al.  Re-identification Methods for Masked Microdata , 2004, Privacy in Statistical Databases.

[29]  Sushil Jajodia,et al.  A Privacy-Enhanced Microaggregation Method , 2002, FoIKS.

[30]  Chen Lihui,et al.  Clustering gene data via Associative Clustering Neural Network , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[31]  C. J. Date An introduction to database systems (7. ed.) , 1999 .

[32]  Josep Domingo-Ferrer,et al.  Towards Fuzzy c-means Based Microaggregation , 2002 .

[33]  Hans Weigand,et al.  An XML-Enabled Association Rule Framework , 2003, DEXA.

[34]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[35]  William E. Winkler,et al.  Disclosure Risk Assessment in Perturbative Microdata Protection , 2002, Inference Control in Statistical Databases.

[36]  A. Solanas,et al.  Multivariate Microaggregation Based Genetic Algorithms , 2006, 2006 3rd International IEEE Conference Intelligent Systems.

[37]  L. Willenborg,et al.  Elements of Statistical Disclosure Control , 2000 .

[38]  A. Solanas,et al.  V-MDAV : A Multivariate Microaggregation With Variable Group Size , 2006 .

[39]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[40]  Grup Crises Microaggregation for Privacy Protection in Statistical Databases , 2004 .

[41]  W. Winkler,et al.  MASKING MICRODATA FILES , 1995 .

[42]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[43]  Marta Más STATISTICAL DATA PROTECTION TECHNIQUES , 2000 .

[44]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .

[45]  Josep Domingo-Ferrer,et al.  Fuzzy Microaggregation for Microdata Protection , 2003, J. Adv. Comput. Intell. Intell. Informatics.

[46]  F. Felsö,et al.  Disclosure limitation methods in use: results of a survey , 2001 .

[47]  Sumitra Mukherjee,et al.  A Polynomial Algorithm for Optimal Univariate Microaggregation , 2003, IEEE Trans. Knowl. Data Eng..

[48]  Dorothy E. Denning,et al.  Secure statistical databases with random sample queries , 1980, TODS.

[49]  Joseph Y. Lo,et al.  Self-organizing map for cluster analysis of a breast cancer database , 2003, Artif. Intell. Medicine.

[50]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[51]  J. Domingo-Ferrer,et al.  A COMPARATIVE STUDY OF MICROAGGREGATION METHODS , 1998 .

[52]  Eduardo B. Fernandez,et al.  Database Security and Integrity , 1981 .

[53]  Grup Crises Information Loss Measures for Microdata in Database Privacy Protection , 2004 .

[54]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[55]  D. Defays,et al.  Masking Microdata Using Micro-Aggregation , 1999 .