Genetic Programming for the Induction of Decision Trees to Model Ecotoxicity Data

Automatic induction of decision trees and production rules from data to develop structure-activity models for toxicity prediction has recently received much attention, and the majority of methodologies reported in the literature are based upon recursive partitioning employing greedy searches to choose the best splitting attribute and value at each node. These approaches can be successful; however, the greedy search will necessarily miss regions of the search space. Recent literature has demonstrated the applicability of genetic programming to decision tree induction to overcome this problem. This paper presents a variant of this novel approach, using fewer mutation options and a simpler fitness function, demonstrating its utility in inducing decision trees for ecotoxicity data, via a case study of two data sets giving improved accuracy and generalization ability over a popular decision tree inducer.