Automatic Data Based Patient Classification Using Genetic Programming

Classification problems are one of the major topics not only in computer science in general, but also especially in medical data mining. When it comes to automatically classifying patients into those which are possibly suffering from a certain disease and those which are healthy, one often has to face the problem that special tests (such as blood analysis, e.g.) produce high costs of time and money. This is why one has to seek for hidden relationships between the symptoms of a certain disease and other features which are much easier to measure. This is usually done by mathematical modeling: On the basis of already known so-called training samples one tries to find a function that allocates these training examples to exactly those classes which they really belong to; this model can be used afterwards for classifying new, unknown samples very quickly. Whereas many existing methods require additional information about the underlying system or user interaction, we here present a fully automatic, self-adaptive and problem instance independent data based classification tool based on Genetic Programming. On the basis of benchmark problem data sets we document this method’s ability to identify models for classifying patients by simply analyzing their blood parameters.

[1]  R. Trappl,et al.  Cybernetics and Systems 2000 , 2000 .

[2]  Michael Affenzeller,et al.  HeuristicLab: A Generic and Extensible Optimization Environment , 2005 .

[3]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[4]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[5]  Stephan M. Winkler,et al.  Identifying Nonlinear Model Structures Using Genetic Programming Techniques , 2007 .

[6]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[7]  Stephan M. Winkler,et al.  Genetic Programming Based Model Structure Identification Using On-Line System Data , 2005 .

[8]  Charles E. Taylor Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. Complex Adaptive Systems.John H. Holland , 1994 .

[9]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[10]  Riccardo Poli,et al.  Foundations of Genetic Programming , 1999, Springer Berlin Heidelberg.

[11]  D. E. Goldberg,et al.  Optimization and Machine Learning , 2022 .

[12]  Stephan M. Winkler,et al.  Goal-oriented preservation of essential genetic information by offspring selection , 2005, GECCO '05.

[13]  Stefan Wagner,et al.  SexualGA: Gender-Specific Selection for Genetic Algorithms , 2005 .

[14]  Stephan M. Winkler,et al.  New methods for the identification of nonlinear model structures based upon genetic programming techniques , 2005 .

[15]  Michael Affenzeller,et al.  SASEGASA: A New Generic Parallel Evolutionary Algorithm for Achieving Highest Quality Results , 2004, J. Heuristics.

[16]  Stephan M. Winkler,et al.  Solving Multiclass Classification Problems by Genetic Programming , 2005 .