Data mining rules using multi-objective evolutionary algorithms

In data mining, nugget discovery is the discovery of interesting classification rules that apply to a target class. In previous research, heuristic methods (genetic algorithms, simulated annealing and tabu search) have been used to optimise a single measure of interest. This paper proposes the use of multi-objective optimisation evolutionary algorithms to allow the user to interactively select a number of interest measures and deliver the best nuggets (an approximation to the Pareto-optimal set) according to those measures. Initial experiments are conducted on a number of databases, using an implementation of the fast elitist non-dominated sorting genetic algorithm (NSGA), and two well known measures of interest. Comparisons with the results obtained using modern heuristic methods are presented. Results indicate the potential of multi-objective evolutionary algorithms for the task of nugget discovery.

[1]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[2]  M. Gen,et al.  Evolution program for bicriteria transportation problem , 1994 .

[3]  C. A. Coello Coello,et al.  A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization Techniques , 1999, Knowledge and Information Systems.

[4]  Victor J. Rayward-Smith,et al.  Discovering Knowledge in Commercial Databases Using Modern Heuristic Techniques , 1996, KDD.

[5]  Kamal Ali,et al.  Partial Classification Using Association Rules , 1997, KDD.

[6]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[7]  Zbigniew Michalewicz,et al.  Evolutionary Computation 2 , 2000 .

[8]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9]  Victor J. Rayward-Smith,et al.  The Discovery of Interesting Nuggets Using Heuristic Techniques , 2002 .

[10]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[11]  David Corne,et al.  The Pareto archived evolution strategy: a new baseline algorithm for Pareto multiobjective optimisation , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[12]  Lino A. Costa,et al.  An elitist genetic algorithm for multiobjective optimization , 2004 .

[13]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[14]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[15]  R. Singer,et al.  The Audubon Society field guide to North American mushrooms , 1981 .

[16]  David E. Goldberg,et al.  A niched Pareto genetic algorithm for multiobjective optimization , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[17]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[18]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..