Exploratory data analysis problems involving lots of data, but few hypotheses about them, are modeled as search problems over large unstructured spaces. For example, following the 1991 Persian Gulf conflict, medical evaluations of participants-conducted due to reports of a "Gulf War Syndrome"-have produced large amounts of medical data. This data, the basis for our investigation of the syndrome, ranges over more than 150 attributes, making the search problem over the attribute space a hard one. We solve it with a genetic algorithm, intertwined with algorithms that operate on the detailed data. Computational results suggest the system has performed a comprehensive search at low cost. Our findings: no indication yet of a single syndrome or other medical entity, but numerous correlations of exposure/demographic information and associated symptoms/diagnoses exist and merit further medical research.
[1]
John R. Koza,et al.
Genetic programming - on the programming of computers by means of natural selection
,
1993,
Complex adaptive systems.
[2]
William Frawley,et al.
Knowledge Discovery in Databases
,
1991
.
[3]
John H. Holland,et al.
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence
,
1992
.
[4]
David E. Goldberg,et al.
Genetic Algorithms in Search Optimization and Machine Learning
,
1988
.
[5]
Padhraic Smyth,et al.
From Data Mining to Knowledge Discovery: An Overview
,
1996,
Advances in Knowledge Discovery and Data Mining.