Attribute Selection with a Multi-objective Genetic Algorithm

In this paper we address the problem of multi-objective attribute selection in data mining. We propose a multi-objective genetic algorithm (GA) based on the wrapper approach to discover the best subset of attributes for a given classification algorithm, namely C4.5, a well-known decision-tree algorithm. The two objectives to be minimized are the error rate and the size of the tree produced by C4.5. The proposed GA is a multi-objective method in the sense that it discovers a set of non-dominated solutions (attribute subsets), according to the concept of Pareto dominance.

[1]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[2]  Alex Alves Freitas,et al.  Understanding the Crucial Role of Attribute Interaction in Data Mining , 2001, Artificial Intelligence Review.

[3]  Hisao Ishibuchi,et al.  Multi-objective pattern and feature selection by a genetic algorithm , 2000, GECCO.

[4]  Julian F. Miller,et al.  Genetic and Evolutionary Computation — GECCO 2003 , 2003, Lecture Notes in Computer Science.

[5]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[6]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[7]  Miroslav Kubat,et al.  Using the Genetic Algorithm to Reduce the Size of a Nearest-Neighbor Classifier and to Select Relevant Attributes , 2001, ICML.

[8]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[9]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[10]  Arno Siebes,et al.  Data Mining: the search for knowledge in databases. , 1994 .

[11]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[12]  Siddhartha Bhattacharyya,et al.  Evolutionary algorithms in data mining: multi-objective performance modeling for direct marketing , 2000, KDD '00.

[13]  Alex A. Freitas,et al.  Discovering comprehensible classification rules with a genetic algorithm , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[14]  M.J. Martin-Bautista,et al.  A survey of genetic feature selection in mining issues , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[15]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .