Advanced Genetic Programming Based Machine Learning

A Genetic Programming based approach for solving classification problems is presented in this paper. Classification is understood as the act of placing an object into a set of categories, based on the object’s properties; classification algorithms are designed to learn a function which maps a vector of object features into one of several classes. This is done by analyzing a set of input-output examples (“training samples”) of the function. Here we present a method based on the theory of Genetic Algorithms and Genetic Programming that interprets classification problems as optimization problems: Each presented instance of the classification problem is interpreted as an instance of an optimization problem, and a solution is found by a heuristic optimization algorithm. The major new aspects presented in this paper are advanced algorithmic concepts as well as suitable genetic operators for this problem class (mainly the creation of new hypotheses by merging already existing ones and their detailed evaluation). The experimental part of the paper documents the results produced using new hybrid variants of Genetic Algorithms as well as investigated parameter settings. Graphical analysis is done using a novel multiclass classifier analysis concept based on the theory of Receiver Operating Characteristic curves.

[1]  Jonathan E. Fieldsend,et al.  Formulation and comparison of multi-class ROC surfaces , 2005 .

[2]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[3]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[4]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5]  Stephan M. Winkler,et al.  Sets of receiver operating characteristic curves and their use in the evaluation of multi-class classification , 2006, GECCO '06.

[6]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[7]  Peter A. Flach,et al.  ROC Analysis in Artificial Intelligence, 1st International Workshop, ROCAI-2004, Valencia, Spain, August 22, 2004 , 2004, ROCAI.

[8]  Dunja Mladenic,et al.  Data mining and decision support : integration and collaboration , 2003 .

[9]  Michael Affenzeller,et al.  SASEGASA: A New Generic Parallel Evolutionary Algorithm for Achieving Highest Quality Results , 2004, J. Heuristics.

[10]  Hans-Georg Beyer,et al.  The Theory of Evolution Strategies , 2001, Natural Computing Series.

[11]  William B. Langdon,et al.  Combining Decision Trees and Neural Networks for Drug Discovery , 2002, EuroGP.

[12]  M. Affenzeller,et al.  Offspring Selection: A New Self-Adaptive Selection Scheme for Genetic Algorithms , 2005 .

[13]  Peter A. Flach,et al.  Decision Support for Data Mining , 2003 .

[14]  Michael Affenzeller,et al.  HeuristicLab: A Generic and Extensible Optimization Environment , 2005 .

[15]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[16]  Riccardo Poli,et al.  Foundations of Genetic Programming , 1999, Springer Berlin Heidelberg.

[17]  Michael Affenzeller,et al.  Segregative Genetic Algorithms (SEGA): A hybrid superstructure upwards compatible to genetic algorithms for retarding premature convergence , 2001, Int. J. Comput. Syst. Signals.

[18]  Peter A. Flach,et al.  Data Mining and Decision Support: Aspects of Integration and Collaboration , 2003 .

[19]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[20]  Stephan M. Winkler,et al.  Using enhanced genetic programming techniques for evolving classifiers in the context of medical diagnosis , 2009, Genetic Programming and Evolvable Machines.

[21]  Stephan M. Winkler,et al.  Automatic Data Based Patient Classification Using Genetic Programming , 2007 .

[22]  David B. Fogel,et al.  An introduction to simulated evolutionary optimization , 1994, IEEE Trans. Neural Networks.

[23]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[24]  Stefan Wagner,et al.  SexualGA: Gender-Specific Selection for Genetic Algorithms , 2005 .

[25]  Stephan M. Winkler,et al.  New methods for the identification of nonlinear model structures based upon genetic programming techniques , 2005 .

[26]  Yuichi Motai,et al.  Incremental On-line PCA for Automatic Motion Learning of Eigen Behavior , 2005, ALaRT.

[27]  Stephan M. Winkler,et al.  Virtual Sensor Design of Particulate and Nitric Oxide Emissions in a DI Diesel Engine , 2005 .

[28]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[29]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .