Multivariate Microaggregation Based Genetic Algorithms

Microaggregation is a clustering problem with cardinality constraints that originated in the area of statistical disclosure control for micro data. This article presents a method for multivariate microaggregation based on genetic algorithms (GA). The adaptations required to characterize the multivariate microaggregation problem are explained and justified. Extensive experimentation has been carried out with the aim of finding the best values for the most relevant parameters of the modified GA: the population size and the crossover and mutation rates. The experimental results demonstrate that our method finds the optimal solution to the problem in almost all experiments when working with small data sets. Thus, for small data sets the proposed method performs better than known polynomial heuristics and can be combined with these for larger data sets. Moreover, a sensitivity analysis of parameter values is reported which shows the influence of the parameters and their best values

[1]  Rajarshi Das,et al.  A Study of Control Parameters Affecting Online Performance of Genetic Algorithms for Function Optimization , 1989, ICGA.

[2]  Wei Chien,et al.  Using NU-SSGA to reduce the searching time in inverse problem of a buried metallic object , 2005, IEEE Transactions on Antennas and Propagation.

[3]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[4]  Annie S. Wu,et al.  The Proportional Genetic Algorithm: Gene Expression in a Genetic Algorithm , 2002, Genetic Programming and Evolvable Machines.

[5]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[6]  A. D. Gordon,et al.  An Algorithm for Euclidean Sum of Squares Classification , 1977 .

[7]  Kin Hong Wong,et al.  Pose estimation for augmented reality applications using genetic algorithm , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Karen A. F. Copeland Design and Analysis of Experiments, 5th Ed. , 2001 .

[9]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[10]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[11]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[12]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[13]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[14]  Ritu Gupta,et al.  Statistical exploratory analysis of genetic algorithms , 2004, IEEE Transactions on Evolutionary Computation.

[15]  Josep Domingo-Ferrer,et al.  Efficient multivariate data-oriented microaggregation , 2006, The VLDB Journal.

[16]  K. KNÖDLER,et al.  Optimizing Data Measurements at Test Beds Using Multi-Step Genetic Algorithms , 2000 .

[17]  R C Durfee,et al.  A METHOD OF CLUSTER ANALYSIS. , 1970, Multivariate behavioral research.

[18]  Josep Domingo-Ferrer,et al.  On the connections between statistical disclosure control for microdata and some artificial intelligence tools , 2003, Inf. Sci..

[19]  Michael J. Laszlo,et al.  Minimum spanning tree partitioning algorithm for microaggregation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[21]  Ding-Zhu Du,et al.  A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering , 2003, J. Glob. Optim..

[22]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[23]  Seppo J. Ovaska,et al.  Genetic algorithm-assisted design of adaptive predictive filters for 50/60 Hz power systems instrumentation , 2005, IEEE Transactions on Instrumentation and Measurement.

[24]  W. Mendenhall,et al.  Statistics for engineering and the sciences , 1984 .

[25]  Sumitra Mukherjee,et al.  A Polynomial Algorithm for Optimal Univariate Microaggregation , 2003, IEEE Trans. Knowl. Data Eng..

[26]  Josep Domingo-Ferrer,et al.  A polynomial-time approximation to optimal multivariate microaggregation , 2008, Comput. Math. Appl..

[27]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[28]  John J. Grefenstette,et al.  Optimization of Control Parameters for Genetic Algorithms , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[29]  W.-C. Liu,et al.  Design of a CPW-fed notched planar monopole antenna for multiband operations using a genetic algorithm , 2005 .

[30]  Graham M. Megson,et al.  Synthesis of a systolic array genetic algorithm , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[31]  D. E. Goldberg,et al.  Genetic Algorithms in Search, Optimization & Machine Learning , 1989 .

[32]  B. Jaumard,et al.  Minimum Sum of Squares Clustering in a Low Dimensional Space , 1996 .