A Hybrid Genetic Algorithm - Decision Tree Classifier

Studies of data mining classification algorithms have shown that these algorithms either cause a loss of quality or scalability aspects or cannot effectively uncover the data structure. This paper presents a new approach for developing two C4.5 based classifiers. The first, RFC4.5, uses the RainForest framework approach while the second, GARFC4.5, uses genetic algorithm. The two classifiers have been applied on medical database of 20MB size for thrombosis diseases, obtained from the discovery challenge competition of the 3rd European Conference on Principles and Practice of Knowledge Discovery in Database held in Prague, 1999. The results show that the two Classifiers give higher classification accuracy than traditional C4.5 classifier. For both classifiers, at a certain population size, it is found that the classification accuracy increases with sample size.