论文信息 - Induction of Classification Rules with Grammar-Based Genetic Programming

Induction of Classification Rules with Grammar-Based Genetic Programming

Abstract—This paper presents an analysis of the suitability ofgrammar-based genetic programming for the classiﬁcation taskin data mining. The evolutionary technique is compared withseveral classic algorithms for inducing decision trees and rules,using classiﬁcation accuracy as the comparison criterion. I. I NTRODUCTION Data mining (DM) consists of the extraction of useful, com-prehensible and previously unknown knowledge, from hugeamounts of data stored in different formats [16]. Classiﬁcationis one of the most studied problems by DM and machinelearning (ML) researchers. It consists in predicting the value ofa (categorical) attribute (the class) based on the values of otherattributes (the predicting attributes). In the ML and DM ﬁelds,classiﬁcation is usually approached as a supervised learningtask. A search algorithm is used to induce a classiﬁer from aset of correctly classiﬁed data instances, called the train set.Another set of correctly classiﬁed data instances, known as thetest set is used to measure the quality of the classiﬁer obtainedafter the learning process. Different paradigms have been usedin order to tackle classiﬁcation: decision trees [10], inductivelearning [8], instance-based learning [1] and, more recently,artiﬁcial neural networks [18] and evolutionary algorithms [4].In this paper, we focus on decision tree, rule induction andevolutionary techniques.Decision tree methods use greedy algorithms. These algo-rithms are generally fast, very effective, accurate and ableto classify data completely. Most decision tree methods userecursive partitioning techniques that split the data space.However, the greedy nature of these algorithms can overlookmultivariate relationships that can’t be found when attributesare considered separately. Rule induction algorithms usuallyemploy a speciﬁc-to-general approach, in which rules aregeneralized (or specialized) until a satisfactory descriptionof each class is obtained. Finally, evolutionary algorithms(EA) are based on the use of probabilistic search algorithmsinspired by certain points of the Darwinian theory of evolution.The ﬂexibility and robustness of EAs allow the discoveryof complex relationships that are usually missed by otheralgorithms.In addition to the learning algorithm, another importantissue that must be considered in classiﬁcation is the repre-sentation formalism. Rules are one of the most often usedformalisms used to represent classiﬁers, and is the one we havechosen for our work (a decision tree can be easily convertedinto a rule set [12]). The rule antecedent (IF part) contains acombination of conditions on the predicting attributes, and therule consequent (THEN part) contains the predicted value forthe class. This way, a rule assigns a data instance to the classpointed out by the consequent if the values of the predictingattributes satisfy the conditions expressed in the antecedent,and so, a classiﬁer is represented as a rule set. The rules usedin our work have the following format.

Cristóbal Romero | Pedro G. Espejo

[1] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[2] Peter Clark,et al. The CN2 induction algorithm , 2004, Machine Learning.

[3] Peter A. Whigham,et al. Grammatical bias for evolutionary learning , 1996 .

[4] Ian Witten,et al. Data Mining , 2000 .

[5] Saso Dzeroski,et al. Inductive Logic Programming: Techniques and Applications , 1993 .

[6] Dr. Alex A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[7] Thomas Bäck,et al. An Overview of Evolutionary Computation , 1993, ECML.

[8] Robert C. Holte,et al. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[9] Ryszard S. Michalski,et al. A theory and methodology of inductive learning , 1993 .

[10] Peter Nordin,et al. Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[11] Aiko M. Hormann,et al. Programs for Machine Learning. Part I , 1962, Inf. Control..