Parallel genetic programming for decision tree induction

A parallel genetic programming approach to induce decision trees in large data sets is presented. A population of trees is evolved by employing the genetic operators and every individual is evaluated by using a fitness function based on the J-measure. The method is able to deal with large data sets since it uses a parallel implementation of genetic programming through the grid model and an out of core technique for those data sets that do not fit in main memory. Preliminary experiments on data sets from the UCI machine learning repository give good classification outcomes and assess the scalability of the method.

[1]  Alex A. Freitas,et al.  A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction , 1997 .

[2]  J. R. Quinlan Induction of decision trees , 2004, Machine Learning.

[3]  Padhraic Smyth,et al.  An Information Theoretic Approach to Rule Induction from Databases , 1992, IEEE Trans. Knowl. Data Eng..

[4]  Tommaso Toffoli,et al.  Cellular automata machines - a new environment for modeling , 1987, MIT Press series in scientific computation.

[5]  C. Pettey Diffusion (cellular) models , 2000 .

[6]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[7]  Nikolay I. Nikolaev,et al.  Inductive Genetic Programming with Decision Trees , 1998, Intell. Data Anal..

[8]  Walter Alden Tackett,et al.  Genetic Programming for Feature Discovery and Image Discrimination , 1993, ICGA.

[9]  Giandomenico Spezzano,et al.  CAGE: A Tool for Parallel Genetic Programming Applications , 2001, EuroGP.

[10]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[11]  Giandomenico Spezzano,et al.  A Cellular Genetic Programming Approach to Classification , 1999, GECCO.

[12]  Xavier Llorà,et al.  Evolution of Decision Trees , 2001 .

[13]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[14]  Ke Wang,et al.  Interestingness-Based Interval Merger for Numeric Association Rules , 1998, KDD.

[15]  H.S. Lopes,et al.  A parallel genetic algorithm for rule discovery in large databases , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).