Evolving accurate and compact classification rules with gene expression programming

Classification is one of the fundamental tasks of data mining. Most rule induction and decision tree algorithms perform a local, greedy search to generate classification rules that are often more complex than necessary. Evolutionary algorithms for pattern classification have recently received increased attention because they can perform global searches. In this paper, we propose a new approach for discovering classification rules by using gene expression programming (GEP), a new technique of genetic programming (GP) with linear representation. The antecedent of discovered rules may involve many different combinations of attributes. To guide the search process, we suggest a fitness function considering both the rule consistency gain and completeness. A multiclass classification problem is formulated as multiple two-class problems by using the one-against-all learning method. The covering strategy is applied to learn multiple rules if applicable for each class. Compact rule sets are subsequently evolved using a two-phase pruning method based on the minimum description length (MDL) principle and the integration theory. Our approach is also noise tolerant and able to deal with both numeric and nominal attributes. Experiments with several benchmark data sets have shown up to 20% improvement in validation accuracy, compared with C4.5 algorithms. Furthermore, the proposed GEP approach is more efficient and tends to generate shorter solutions compared with canonical tree-based GP classifiers.

[1]  Stephen F. Smith,et al.  Competition-based induction of decision models from examples , 1993, Machine Learning.

[2]  Francisco Herrera,et al.  GENERATING FUZZY RULES FROM EXAMPLES USING GENETIC ALGORITHMS , 1995 .

[3]  Nicholas J. Radcliffe,et al.  A Genetic Algorithm-Based Approach to Data Mining , 1996, KDD.

[4]  William B. Langdon,et al.  Application of Genetic Programming to Induction of Linear Classification Trees , 2000, EuroGP.

[5]  Cezary Z. Janikow,et al.  A knowledge-intensive genetic algorithm for supervised learning , 1993, Machine Learning.

[6]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[7]  Lalit M. Patnaik,et al.  Application of genetic programming for multicategory pattern classification , 2000, IEEE Trans. Evol. Comput..

[8]  Kwong-Sak Leung,et al.  Data Mining Using Grammar Based Genetic Programming and Applications , 2000 .

[9]  Cândida Ferreira,et al.  Gene Expression Programming: A New Adaptive Algorithm for Solving Problems , 2001, Complex Syst..

[10]  Sean Luke,et al.  Code Growth Is Not Caused by Introns , 2000 .

[11]  Victor J. Rayward-Smith,et al.  Rule Induction Using a Reverse Polish Representation , 1999, GECCO.

[12]  Michael O'Neill,et al.  Grammatical Evolution: Evolving Programs for an Arbitrary Language , 1998, EuroGP.

[13]  John H. Holland,et al.  Cognitive systems based on adaptive algorithms , 1977, SGAR.

[14]  Gholamreza Nakhaeizadeh,et al.  Machine learning and statistics: the interface , 1996 .

[15]  Nikolay I. Nikolaev,et al.  Inductive Genetic Programming with Decision Trees , 1998, Intell. Data Anal..

[16]  Mike Livesey,et al.  Distinguishing genotype and phenotype in genetic programming , 1996 .

[17]  Vic Ciesielski,et al.  Representing classification problems in genetic programming , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[18]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[19]  Alex A. Freitas,et al.  Data Mining with Constrained-syntax Genetic Programming: Applications in Medical Data Sets , 2001 .

[20]  Kenneth A. De Jong,et al.  Using genetic algorithms for concept learning , 1993, Machine Learning.

[21]  David J. Montana,et al.  Strongly Typed Genetic Programming , 1995, Evolutionary Computation.

[22]  Byoung-Tak Zhang,et al.  Balancing Accuracy and Parsimony in Genetic Programming , 1995, Evolutionary Computation.

[23]  Alex A. Freitas,et al.  A survey of evolutionary algorithms for data mining and knowledge discovery , 2003 .

[24]  C. Janikow A Knowledge-Intensive Genetic Algorithm for Supervised Learning , 2004, Machine Learning.

[25]  John R. Koza,et al.  Concept Formation and Decision Tree Induction Using the Genetic Programming Paradigm , 1990, PPSN.

[26]  K. De Jong,et al.  Using Genetic Algorithms for Concept Learning , 2004, Machine Learning.

[27]  Gilles Venturini,et al.  Learning First Order Logic Rules with a Genetic Algorithm , 1995, KDD.

[28]  Justinian Rosca,et al.  Generality versus size in genetic programming , 1996 .

[29]  John H. Holland,et al.  COGNITIVE SYSTEMS BASED ON ADAPTIVE ALGORITHMS1 , 1978 .

[30]  Gilles Venturini,et al.  SIA: A Supervised Inductive Algorithm with Genetic Search for Learning Attributes based Concepts , 1993, ECML.

[31]  Walter Alden Tackett,et al.  Genetic Programming for Feature Discovery and Image Discrimination , 1993, ICGA.

[32]  Alex A. Freitas,et al.  A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction , 1997 .

[33]  Stewart W. Wilson,et al.  Learning Classifier Systems, From Foundations to Applications , 2000 .

[34]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[35]  Arthur Tay,et al.  Mining multiple comprehensible classification rules using genetic programming , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[36]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[37]  Filippo Neri,et al.  Search-Intensive Concept Induction , 1995, Evolutionary Computation.

[38]  Sandip Sen,et al.  Using real-valued genetic algorithms to evolve rule sets for classification , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[39]  Luís Torgo,et al.  Knowledge Acquisition via Knowledge Integration , 1990 .

[40]  Stephen F. Smith,et al.  Flexible Learning of Problem Solving Heuristics Through Adaptive Search , 1983, IJCAI.

[41]  Jano I. van Hemert,et al.  A Comparison of Genetic Programming Variants for Data Classification , 1999, IDA.

[42]  Alex A. Freitas,et al.  Discovering interesting prediction rules with a genetic algorithm , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[43]  Cândida Ferreira,et al.  Mutation, Transposition, and Recombination: An Analysis of the Evolutionary Dynamics , 2002, JCIS.

[44]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[45]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[46]  Hisao Ishibuchi,et al.  Selecting fuzzy if-then rules for classification problems using genetic algorithms , 1995, IEEE Trans. Fuzzy Syst..

[47]  Kenneth A. Kaufman,et al.  Learning Patterns in Noisy Data: The AQ Approach , 2001, Machine Learning and Its Applications.

[48]  Wolfgang Banzhaf,et al.  Genotype-Phenotype-Mapping and Neutral Variation - A Case Study in Genetic Programming , 1994, PPSN.

[49]  Nikolay I. Nikolaev,et al.  Inductive Genetic Programming with Decision Trees , 1997, Intell. Data Anal..

[50]  Johannes Fürnkranz,et al.  Round Robin Rule Learning , 2001, ICML.

[51]  Jukka Hekanaho,et al.  GA-Based Rule Enhancement in Concept Learning , 1997, KDD.

[52]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[53]  Alex Alves Freitas,et al.  Discovering comprehensible classification rules by using Genetic Programming: a case study in a medical domain , 1999, GECCO.

[54]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.