Discovering Knowledge Nuggets with a Genetic Algorithm

Measuring the quality of a prediction rule is a difficult task, which can involve several criteria. The majority of the rule induction literature focuses on discovering accurate, comprehensible rules. In this chapter we also take these two criteria into account, but we go beyond them in the sense that we aim at discovering rules that are interesting (surprising) for the user. Hence, the search for rules is guided by a rule-evaluation function that considers both the degree of predictive accuracy and the degree of interestingness of candidate rules. The search is performed by two versions of a genetic algorithm (GA) specifically designed to the discovery of interesting rules - or “knowledge nuggets.” The algorithm addresses the dependence modeling task (sometimes called “generalized rule induction”), where different rules can predict different goal attributes. This task can be regarded as a generalization of the very well known classification task, where all rules predict the same goal attribute. This chapter also compares the results of the two versions of the GA with the results of a simpler, greedy rule induction algorithm to discover interesting rules.

[1]  Haym Hirsh,et al.  The Problem with Noise and Small Disjuncts , 1998, ICML.

[2]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[3]  Vasant Dhar,et al.  Discovering Interesting Patterns for Investment Decision Making with GLOWER ☹—A Genetic Learner Overlaid with Entropy Reduction , 2000, Data Mining and Knowledge Discovery.

[4]  Kenneth A. Kaufman,et al.  Data Mining and Knowledge Discovery: A Review of Issues and a Multistrategy Approach , 1997 .

[5]  Julian F. Miller,et al.  Genetic and Evolutionary Computation — GECCO 2003 , 2003, Lecture Notes in Computer Science.

[6]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[7]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[8]  Deborah R. Carvalho,et al.  A Genetic Algorithm-Based Solution for the Problem of Small Disjuncts , 2000, PKDD.

[9]  Haym Hirsh,et al.  A Quantitative Study of Small Disjuncts , 2000, AAAI/IAAI.

[10]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[11]  Rajkumar Roy,et al.  Advances in Soft Computing: Engineering Design and Manufacturing , 1998 .

[12]  Alex A. Freitas,et al.  A distributed-population genetic algorithm for discovering interesting prediction rules , 2002 .

[13]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[14]  Kwong-Sak Leung,et al.  Data Mining Using Grammar Based Genetic Programming and Applications , 2000 .

[15]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[16]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[17]  Stephen F. Smith,et al.  Competition-based induction of decision models from examples , 1993, Machine Learning.

[18]  Jan M. Zytkow,et al.  Handbook of Data Mining and Knowledge Discovery , 2002 .

[19]  Ivan Bratko,et al.  Machine Learning and Data Mining; Methods and Applications , 1998 .

[20]  Alex Alves Freitas,et al.  Understanding the Crucial Role of Attribute Interaction in Data Mining , 2001, Artificial Intelligence Review.

[21]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[22]  John M. Aronis,et al.  Scaling Up Inductive Learning with Massive Parallelism , 2005, Machine Learning.

[23]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[24]  Alex A. Freitas,et al.  A Distributed-Population GA for Discovering Interesting Prediction Rules , 2003 .

[25]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[26]  Cullen Schaffer Overfitting avoidance as bias , 2004, Machine Learning.

[27]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[28]  Alex A. Freitas,et al.  Discovering interesting prediction rules with a genetic algorithm , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[29]  Zbigniew Michalewicz,et al.  Genetic algorithms + data structures = evolution programs (3rd ed.) , 1996 .

[30]  Alex Alves Freitas,et al.  Mining Very Large Databases with Parallel Processing , 1997, The Kluwer International Series on Advances in Database Systems.

[31]  Alex A. Freitas,et al.  A Genetic Algorithm for Generalized Rule Induction , 1999 .

[32]  Deborah R. Carvalho,et al.  A Genetic Algorithm With Sequential Niching For Discovering Small-disjunct Rules , 2002, GECCO.

[33]  Alex Alves Freitas,et al.  Understanding the crucial differences between classification and discovery of association rules: a position paper , 2000, SKDD.

[34]  David B. Fogel,et al.  Evolution-ary Computation 1: Basic Algorithms and Operators , 2000 .

[35]  Alex Alves Freitas,et al.  On Objective Measures of Rule Surprisingness , 1998, PKDD.

[36]  Gary M. Weiss Learning with Rare Cases and Small Disjuncts , 1995, ICML.

[37]  Dimitrios Kalles,et al.  Breeding Decision Trees Using Evolutionary Techniques , 2001, ICML.