A hybrid decision tree/genetic algorithm method for data mining

This paper addresses the well-known classification task of data mining, where the objective is to predict the class which an example belongs to. Discovered knowledge is expressed in the form of high-level, easy-to-interpret classification rules. In order to discover classification rules, we propose a hybrid decision tree/genetic algorithm method. The central idea of this hybrid method involves the concept of small disjuncts in data mining, as follows. In essence, a set of classification rules can be regarded as a logical disjunction of rules. so that each rule can be regarded as a disjunct. A small disjunct is a rule covering a small number of examples. Due to their nature, small disjuncts are error prone. However, although each small disjunct covers just a few examples, the set of all small disjuncts can cover a large number of examples, so that it is important to develop new approaches to cope with the problem of small disjuncts. In our hybrid approach, we have developed two genetic algorithms (GA) specifically designed for discovering rules covering examples belonging to small disjuncts, whereas a conventional decision tree algorithm is used to produce rules covering examples belonging to large disjuncts. We present results evaluating the performance of the hybrid method in 22 real-world data sets.

[1]  Larry A. Rendell,et al.  Improving the Design of Induction Methods by Analyzing Algorithm Functionality and Data-Based Concept Complexity , 1993, IJCAI.

[2]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1992, Artificial Intelligence.

[3]  David E. Goldberg,et al.  Genetic Algorithms with Sharing for Multimodalfunction Optimization , 1987, ICGA.

[4]  Deborah R. Carvalho,et al.  A hybrid decision tree/genetic algorithm for coping with the problem of small disjuncts in data mining , 2000, GECCO.

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Foster J. Provost,et al.  Small Disjuncts in Action: Learning to Diagnose Errors in the Local Loop of the Telephone Network , 1993, ICML.

[7]  Marco Tomassini,et al.  Evolutionary Algorithms , 1995, Towards Evolvable Hardware.

[8]  Alex Alves Freitas,et al.  Understanding the Crucial Role of Attribute Interaction in Data Mining , 2001, Artificial Intelligence Review.

[9]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[10]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[11]  Vasant Dhar,et al.  Discovering Interesting Patterns for Investment Decision Making with GLOWER ☹—A Genetic Learner Overlaid with Entropy Reduction , 2000, Data Mining and Knowledge Discovery.

[12]  Robert C. Holte,et al.  Concept Learning and the Problem of Small Disjuncts , 1989, IJCAI.

[13]  Deborah R. Carvalho,et al.  A genetic-algorithm for discovering small-disjunct rules in data mining , 2002, Appl. Soft Comput..

[14]  Haym Hirsh,et al.  The Problem with Noise and Small Disjuncts , 1998, ICML.

[15]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[16]  William W. Cohen Efficient Pruning Methods for Separate-and-Conquer Rule Learning Systems , 1993, IJCAI.

[17]  Alneu de Andrade Lopes,et al.  Integrating Rules and Cases in Learning via Case Explanation and Paradigm Shift , 2000, IBERAMIA-SBIA.

[18]  Zbigniew Michalewicz,et al.  Genetic algorithms + data structures = evolution programs (3rd ed.) , 1996 .

[19]  Samir W. Mahfoud Niching methods for genetic algorithms , 1996 .

[20]  Larry A. Rendell,et al.  Learning hard concepts through constructive induction: framework and rationale , 1990, Comput. Intell..

[21]  Ralph R. Martin,et al.  A Sequential Niche Technique for Multimodal Function Optimization , 1993, Evolutionary Computation.

[22]  Deborah R. Carvalho,et al.  A Genetic Algorithm-Based Solution for the Problem of Small Disjuncts , 2000, PKDD.

[23]  Stephen F. Smith,et al.  Competition-Based Induction of Decision Models from Examples , 2004, Machine Learning.

[24]  Zbigniew Michalewicz,et al.  Evolutionary algorithms , 1997, Emerging Evolutionary Algorithms for Antennas and Wireless Communications.

[25]  Haym Hirsh,et al.  A Quantitative Study of Small Disjuncts , 2000, AAAI/IAAI.

[26]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[27]  Deborah R. Carvalho,et al.  A Genetic Algorithm With Sequential Niching For Discovering Small-disjunct Rules , 2002, GECCO.

[28]  Wynne Hsu,et al.  Multi-level organization and summarization of the discovered rules , 2000, KDD '00.

[29]  David B. Fogel,et al.  Evolution-ary Computation 1: Basic Algorithms and Operators , 2000 .

[30]  J. R. Quinlan Learning Logical Definitions from Relations , 1990 .

[31]  Gary M. Weiss Learning with Rare Cases and Small Disjuncts , 1995, ICML.

[32]  Dimitrios Kalles,et al.  Breeding Decision Trees Using Evolutionary Techniques , 2001, ICML.

[33]  D. Fogel,et al.  Basic Algorithms and Operators , 1999 .