Data Mining by Decomposition: Adaptive Search for Hypothesis Generation

Data mining methods search large databases for interesting patterns that may lead to useful decisions in organizations. When the database is defined over scores of attributes, the complexity of the search increases due to the combinatorial explosion at the attribute-space level, because billions of attribute subsets are candidates for forming interesting patterns in the database. A useful way to address this complexity is to partition the search problem and apply separate, but intertwined, algorithms for attribute search and pattern search. A genetic algorithm is applied on the attribute search problem to identify subsets that lead to more interesting patterns. This method is applied on a real world database arising from the investigations into the "Persian Gulf Illness." Computational experiments resulted in significant success compared to random or manual attribute selection.

[1]  C. Reeves Modern heuristic techniques for combinatorial problems , 1993 .

[2]  L H Kuller,et al.  The use of existing databases in morbidity and mortality studies. , 1995, American journal of public health.

[3]  K. Kinnear Fitness landscapes and difficulty in genetic programming , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[4]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[5]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[6]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[7]  Scott Robert Ladd,et al.  Genetic algorithms in C , 1995 .

[8]  J. Kingdon,et al.  The shape of space , 1995 .

[9]  Jean-Michel Renders,et al.  Hybridizing genetic algorithms with hill-climbing methods for global optimization: two possible ways , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[10]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[11]  Hemant K. Bhargava,et al.  Genetic storms: investigating the Gulf War Syndrome , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.