Genetic programming for medical classification: a program simplification approach

This paper describes a genetic programming (GP) approach to medical data classification problems. In this approach, the evolved genetic programs are simplified online during the evolutionary process using algebraic simplification rules, algebraic equivalence and prime techniques. The new simplification GP approach is examined and compared to the standard GP approach on two medical data classification problems. The results suggest that the new simplification GP approach can not only be more efficient with slightly better classification performance than the basic GP system on these problems, but also significantly reduce the sizes of evolved programs. Comparison with other methods including decision trees, naive Bayes, nearest neighbour, nearest centroid, and neural networks suggests that the new GP approach achieved superior results to almost all of these methods on these problems. The evolved genetic programs are also easier to interpret than the “hidden patterns” discovered by the other methods.

[1]  J A Foster,et al.  Effects of code growth and parsimony pressure on populations in genetic programming. , 1998, Evolutionary computation.

[2]  Terence Soule,et al.  Code growth in genetic programming , 1996 .

[3]  Victor Ciesielski,et al.  Genetic Programming for Multiple Class Object Detection , 1999, Australian Joint Conference on Artificial Intelligence.

[4]  Mengjie Zhang,et al.  Program Size and Pixel Statistics in Genetic Programming for Object Detection , 2004, EvoWorkshops.

[5]  Wolfgang Banzhaf,et al.  A comparison of linear genetic programming and neural networks in medical data mining , 2001, IEEE Trans. Evol. Comput..

[6]  Georgios Dounias,et al.  Evolving rule-based systems in two medical domains using genetic programming , 2004, Artif. Intell. Medicine.

[7]  Sean Luke,et al.  Lexicographic Parsimony Pressure , 2002, GECCO.

[8]  W. Martin Determining the equivalence of algebraic expressions by hash coding , 1971, SYMSAC '71.

[9]  James F. Smith Genetic Program Based Data Mining for Fuzzy Decision Trees , 2004, IDEAL.

[10]  Wolfgang Banzhaf,et al.  A Comparison of Genetic Programming and Neural Networks in Medical Data Analysis , 1998 .

[11]  Harald Niederreiter,et al.  Introduction to finite fields and their applications: List of Symbols , 1986 .

[12]  Larry Bull,et al.  Feature Construction and Selection Using Genetic Programming and a Genetic Algorithm , 2003, EuroGP.

[13]  Byoung-Tak Zhang,et al.  Balancing Accuracy and Parsimony in Genetic Programming , 1995, Evolutionary Computation.

[14]  Gaston H. Gonnet,et al.  Determining equivalence of expressions in random polynomial time , 1984, STOC '84.

[15]  Xiaodong Li,et al.  Multi-objective techniques in genetic programming for evolving classifiers , 2005, 2005 IEEE Congress on Evolutionary Computation.

[16]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[17]  P. K. Chawdhry,et al.  Soft Computing in Engineering Design and Manufacturing , 1998, Springer London.

[18]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[19]  R. Poli Genetic programming for image analysis , 1996 .

[20]  Alex Alves Freitas,et al.  A constrained-syntax genetic programming system for discovering classification rules: application to medical data sets , 2004, Artif. Intell. Medicine.

[21]  Terence Soule,et al.  Growth of self-canceling code in evolutionary systems , 2006, GECCO.

[22]  Vili Podgorelec,et al.  Medical diagnosis prediction using genetic programming , 1999, Proceedings 12th IEEE Symposium on Computer-Based Medical Systems (Cat. No.99CB36365).

[23]  Kent D. Boklan,et al.  Introduction to cryptography with coding theory, second edition , 2007 .

[24]  Terence Soule,et al.  Effects of Code Growth and Parsimony Pressure on Populations in Genetic Programming , 1998, Evolutionary Computation.

[25]  Alex A. Freitas,et al.  An ant colony based system for data mining: applications to medical data , 2001 .

[26]  Anikó Ekárt,et al.  Shorter Fitness Preserving Genetic Programs , 1999, Artificial Evolution.

[27]  M. Sipper,et al.  Applying Fuzzy CoCo to breast cancer diagnosis , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[28]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[29]  Peter J. Bentley,et al.  Evolving fuzzy rules for pattern classification , 1999 .

[30]  Stephan M. Winkler,et al.  A Genetic Programming Based Tool for Supporting Bioinformatical Classification Problems , 2005 .

[31]  David Jackson Fitness evaluation avoidance in Boolean GP problems , 2005, 2005 IEEE Congress on Evolutionary Computation.

[32]  Richard M. Friedberg,et al.  A Learning Machine: Part I , 1958, IBM J. Res. Dev..

[33]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex Adaptive Systems.

[34]  Lothar Thiele,et al.  Genetic Programming and Redundancy , 1994 .

[35]  Alex Alves Freitas,et al.  Discovering comprehensible classification rules by using Genetic Programming: a case study in a medical domain , 1999, GECCO.

[36]  William D. Smart,et al.  Program Simplification in Genetic Programming for Object Classification , 2005, KES.

[37]  Riccardo Poli,et al.  Fitness Causes Bloat: Mutation , 1997, EuroGP.

[38]  Nicholas S. Flann,et al.  Improving the accuracy and robustness of genetic programming through expression simplification , 1996 .

[39]  Matthew J. Streeter,et al.  The Root Causes of Code Growth in Genetic Programming , 2003, EuroGP.

[40]  Vic Ciesielski,et al.  Representing classification problems in genetic programming , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[41]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[42]  Michael G. Madden Evaluation of the Performance of the Markov Blanket Bayesian Classifier Algorithm , 2002, ArXiv.

[43]  Walter Alden Tackett,et al.  Genetic Programming for Feature Discovery and Image Discrimination , 1993, ICGA.

[44]  William B. Langdon,et al.  Quadratic Bloat in Genetic Programming , 2000, GECCO.

[45]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[46]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[47]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[48]  Victor Ciesielski,et al.  A Domain-Independent Window Approach to Multiclass Object Detection Using Genetic Programming , 2003, EURASIP J. Adv. Signal Process..

[49]  Hussein A. Abbass,et al.  An evolutionary artificial neural networks approach for breast cancer diagnosis , 2002, Artif. Intell. Medicine.

[50]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[51]  Riccardo Poli,et al.  Fitness Causes Bloat , 1998 .

[52]  Graham Kendall,et al.  Problem Difficulty and Code Growth in Genetic Programming , 2004, Genetic Programming and Evolvable Machines.

[53]  Peter Nordin,et al.  Complexity Compression and Evolution , 1995, ICGA.

[54]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[55]  Zbigniew Michalewicz,et al.  Genetic algorithms + data structures = evolution programs (3rd ed.) , 1996 .

[56]  P. Nordin,et al.  Explicitly defined introns and destructive crossover in genetic programming , 1996 .

[57]  Stephan M. Winkler,et al.  Using enhanced genetic programming techniques for evolving classifiers in the context of medical diagnosis , 2009, Genetic Programming and Evolvable Machines.

[58]  Alan Piszcz,et al.  Dynamics of evolutionary robustness , 2006, GECCO '06.

[59]  Daniel A. Ashlock,et al.  Single parent genetic programming , 2005, 2005 IEEE Congress on Evolutionary Computation.

[60]  James F. Smith,et al.  Data Mining for Fuzzy Decision Tree Structure with a Genetic Program , 2002, IDEAL.

[61]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[62]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .