Fitness Functions in Genetic Programming for Classification with Unbalanced Data

This paper describes a genetic programming (GP) approach to binary classification with class imbalance problems. This approach is examined on two benchmark and two synthetic data sets. The results show that when using the overall classification accuracy as the fitness function, the GP system is strongly biased toward the majority class. Two new fitness functions are developed to deal with the class imbalance problem. The experimental results show that both of them substantially improve the performance for the minority class, and the performance for the majority and minority classes is much more balanced.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[3]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[4]  Ester Bernadó-Mansilla,et al.  The class imbalance problem in learning classifier systems: a preliminary study , 2005, GECCO '05.

[5]  Nikhil R. Pal,et al.  A novel approach to design classifiers using genetic programming , 2004, IEEE Transactions on Evolutionary Computation.

[6]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[7]  Victor Ciesielski,et al.  A Domain-Independent Window Approach to Multiclass Object Detection Using Genetic Programming , 2003, EURASIP J. Adv. Signal Process..

[8]  Foster Provost,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .

[9]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex Adaptive Systems.

[10]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[11]  Peter A. Flach The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics , 2003, ICML.

[12]  Krzysztof Krawiec,et al.  Visual learning by coevolutionary feature synthesis , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).