Classification of epidemiological data: a comparison of genetic algorithm and decision tree approaches

Describes an application of genetic algorithms (GAs) to classify epidemiological data, which is often challenging to classify due to noise and other factors. For such complex data (that requires a large number of very specific rules in order to achieve high accuracy), smaller rule sets, composed of more general rules, may be preferable, even if they are less accurate. The GA presented in this paper allows the user to encourage smaller rule sets by setting a parameter. The rule sets found are also compared to those created by standard decision-tree algorithms. The results illustrate tradeoffs involving the number of rules, descriptive accuracy, predictive accuracy, and accuracy in describing and predicting positive examples across different rule sets.