Data Mining based on Gene Expression Programming and Clonal Selection

A hybrid evolutionary technique is proposed for data mining tasks, which combines the Clonal Selection Principle with Gene Expression Programming (GEP). The proposed algorithm introduces the notion of Data Class Antigens, which is used to represent a class of data. The produced rules are evolved by a clonal selection algorithm, which extends the recently proposed CLONALG algorithm. In the present algorithm, among other new features, a receptor editing step has been incorporated. Moreover, the rules themselves are represented as antibodies, which are coded as GEP chromosomes, in order to exploit the flexibility and the expressiveness of such encoding. The algorithm is tested on some benchmark problems of the UCI repository, and in particular on the set of MONK problems and the Pima Indians Diabetes problem. In both problems, the results in terms of prediction accuracy are very satisfactory, albeit slightly less accurate than those obtained by a standard GEP technique. In terms of convergence rate and computational efficiency, however, the technique proposed here markedly outperforms the standard GEP algorithm.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  F. Burnet The clonal selection theory of acquired immunity , 1959 .

[3]  T. Fukuda,et al.  Immune Networks Using Genetic Algorithm for Adaptive Production Scheduling , 1993 .

[4]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[5]  Weimin Xiao,et al.  Evolving accurate and compact classification rules with gene expression programming , 2003, IEEE Trans. Evol. Comput..

[6]  Simon M. Garrett,et al.  Improved Pattern Recognition with Artificial Clonal Selection? , 2003, ICARIS.

[7]  M. Pike,et al.  Somatic Mutation , 1965, British medical journal.

[8]  Vincenzo Cutello,et al.  A Hybrid Immune Algorithm with Information Gain for the Graph Coloring Problem , 2003, GECCO.

[9]  D. Dasgupta Artificial Immune Systems and Their Applications , 1998, Springer Berlin Heidelberg.

[10]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[11]  Jorma Rissanen,et al.  MDL Denoising , 2000, IEEE Trans. Inf. Theory.

[12]  Leandro Nunes de Castro,et al.  Artificial Immune Systems: Part I-Basic Theory and Applications , 1999 .

[13]  Fernando José Von Zuben,et al.  Learning and optimization using the clonal selection principle , 2002, IEEE Trans. Evol. Comput..

[14]  Jonathan Timmis,et al.  Artificial Immune Recognition System (AIRS): An Immune-Inspired Supervised Learning Algorithm , 2004, Genetic Programming and Evolvable Machines.

[15]  John E. Hunt,et al.  Learning using an artificial immune system , 1996 .

[16]  Dipankar Dasgupta,et al.  Artificial neural networks and artificial immune systems: similarities and differences , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[17]  M. Nussenzweig,et al.  Immune Receptor Editing Revise and Select , 1998, Cell.

[18]  Cândida Ferreira,et al.  Gene Expression Programming: A New Adaptive Algorithm for Solving Problems , 2001, Complex Syst..

[19]  Leandro Nunes de Castro,et al.  ARTIFICIAL IMMUNE SYSTEMS: PART II - A SURVEY OF APPLICATIONS , 2000 .

[20]  R G Weinand,et al.  Somatic mutation, affinity maturation and the antibody repertoire: a computer model. , 1990, Journal of theoretical biology.

[21]  Kenneth A. Kaufman,et al.  A Measure of Description Quality for Data Mining and its Implementation in the AQ18 Learning System , 1999 .