Discovering interesting classification rules with genetic programming

Abstract Data mining deals with the problem of discovering novel and interesting knowledge from large amount of data. This problem is often performed heuristically when the extraction of patterns is difficult using standard query mechanisms or classical statistical methods. In this paper a genetic programming framework, capable of performing an automatic discovery of classification rules easily comprehensible by humans, is presented. A comparison with the results achieved by other techniques on a classical benchmark set is carried out. Furthermore, some of the obtained rules are shown and the most discriminating variables are evidenced.

[1]  R. Bone Discovery , 1938, Nature.

[2]  Alex Alves Freitas,et al.  Discovering comprehensible classification rules by using Genetic Programming: a case study in a medical domain , 1999, GECCO.

[3]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[4]  Wolfgang Banzhaf,et al.  A comparison of linear genetic programming and neural networks in medical data mining , 2001, IEEE Trans. Evol. Comput..

[5]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[6]  Cosimo Anglano,et al.  A Network Genetic Algorithm for Concept Learning , 1997, ICGA.

[7]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[8]  Alex A. Freitas,et al.  Discovering interesting prediction rules with a genetic algorithm , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[9]  Xin Yao,et al.  A new evolutionary system for evolving artificial neural networks , 1997, IEEE Trans. Neural Networks.

[10]  Rudy Setiono,et al.  Use of a quasi-Newton method in a feedforward neural network construction algorithm , 1995, IEEE Trans. Neural Networks.

[11]  Kagan Tumer,et al.  Classifier combining through trimmed means and order statistics , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[12]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[13]  Michael Y. Hu,et al.  Estimating breast cancer risks using neural networks , 2002, J. Oper. Res. Soc..

[14]  Abdesselam Bouzerdoum,et al.  Automatic selection of features for classification using genetic programming , 1996, 1996 Australian New Zealand Conference on Intelligent Information Systems. Proceedings. ANZIIS 96.

[15]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[16]  Changhwan Lee,et al.  A Multistrategy Approach to Classification Learning in Databases , 1999, Data Knowl. Eng..

[17]  Lutz Prechelt,et al.  A Set of Neural Network Benchmark Problems and Benchmarking Rules , 1994 .

[18]  Samir W. Mahfoud Niching methods for genetic algorithms , 1996 .

[19]  David B. Fogel,et al.  Linear and neural models for classifying breast masses , 1998, IEEE Transactions on Medical Imaging.

[20]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[21]  Douglas B. Kell,et al.  Explanatory Analysis of the Metabolome Using Genetic Programming of Simple, Interpretable Rules , 2000, Genetic Programming and Evolvable Machines.

[22]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[23]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[24]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[25]  C.A. Pena-Reyes,et al.  Designing breast cancer diagnostic systems via a hybrid fuzzy-genetic methodology , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[26]  Alex A. Freitas,et al.  A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction , 1997 .

[27]  Gilles Venturini,et al.  Learning First Order Logic Rules with a Genetic Algorithm , 1995, KDD.

[28]  C. J. V. Rijsbergen,et al.  Rough Sets, Fuzzy Sets and Knowledge Discovery , 1994, Workshops in Computing.

[29]  Arno Siebes,et al.  Data Mining: the search for knowledge in databases. , 1994 .

[30]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[31]  O. Mangasarian,et al.  Semi-Supervised Support Vector Machines for Unlabeled Data Classification , 2001 .

[32]  Lorenza Saitta,et al.  Learning Disjunctive Concepts by Means of Genetic Algorithms , 1994, ICML.

[33]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[34]  Hisao Ishibuchi,et al.  Selecting fuzzy if-then rules for classification problems using genetic algorithms , 1995, IEEE Trans. Fuzzy Syst..