The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification

In this paper, we present an application of multi-objective metaheuristics to the field of data mining. We introduce the data mining task of nugget discovery (also known as partial classification) and show how the multi-objective metaheuristic algorithm NSGA II can be modified to solve this problem. We also present an alternative algorithm for the same task, the ARAC algorithm, which can find all rules that are best according to some measures of interest subject to certain constraints. The ARAC algorithm provides an excellent basis for comparison with the results of the multi-objective metaheuristic algorithm as it can deliver the Pareto optimal front consisting of all partial classification rules that lie in the upper confidence/coverage border, for databases of limited size. We present the results of experiments with various well-known databases for both algorithms. We also discuss how the two methods can be used complementarily for large databases to deliver a set of best rules according to some predefined criteria, providing a powerful tool for knowledge discovery in databases.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[3]  Oren Etzioni,et al.  Representation design and brute-force induction in a Boeing manufacturing domain , 1994, Appl. Artif. Intell..

[4]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[5]  David Biggs,et al.  A method of choosing multiway partitions for classification and decision trees , 1991 .

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Kalyanmoy Deb,et al.  MULTI-OBJECTIVE FUNCTION OPTIMIZATION USING NON-DOMINATED SORTING GENETIC ALGORITHMS , 1994 .

[8]  Victor J. Rayward-Smith,et al.  The Discovery of Interesting Nuggets Using Heuristic Techniques , 2002 .

[9]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[10]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[11]  Hussein A. Abbass,et al.  Data Mining: A Heuristic Approach , 2002 .

[12]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 2004, Data Mining and Knowledge Discovery.

[13]  Kamal Ali,et al.  Partial Classification Using Association Rules , 1997, KDD.

[14]  David Corne,et al.  The Pareto archived evolution strategy: a new baseline algorithm for Pareto multiobjective optimisation , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[15]  Kyuseok Shim,et al.  Mining optimized gain rules for numeric attributes , 1999, KDD '99.

[16]  Djamel A. Zighed,et al.  Optimal Multiple Intervals Discretization of Continuous Attributes for Supervised Learning , 1997, KDD.

[17]  Victor J. Rayward-Smith,et al.  Discovery of association rules in tabular data , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[18]  Victor J. Rayward-Smith,et al.  Building the KDD Roadmap: A Methodology for Knowledge Discovery , 2001 .

[19]  Alex Alves Freitas,et al.  On rule interestingness measures , 1999, Knowl. Based Syst..

[20]  Rajkumar Roy,et al.  Industrial Knowledge Management , 2001 .

[21]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[22]  David E. Goldberg,et al.  A niched Pareto genetic algorithm for multiobjective optimization , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[23]  Yasuhiko Morimoto,et al.  Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases , 1998, VLDB.

[24]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[25]  Lothar Thiele,et al.  An evolutionary algorithm for multiobjective optimization: the strength Pareto approach , 1998 .

[26]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[27]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[28]  Heikki Mannila,et al.  Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining , 1997 .

[29]  Victor J. Rayward-Smith,et al.  The discovery of association rules from tabular databases comprising nominal and ordinal attributes , 2002, Intell. Data Anal..

[30]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[31]  John A. Major,et al.  Selecting among rules induced from a hurricane database , 1993, Journal of Intelligent Information Systems.

[32]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[33]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[34]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[35]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[36]  Alex Alves Freitas,et al.  On Objective Measures of Rule Surprisingness , 1998, PKDD.

[37]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[38]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[39]  Victor J. Rayward-Smith,et al.  Discovering Knowledge in Commercial Databases Using Modern Heuristic Techniques , 1996, KDD.

[40]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[41]  Yasuhiko Morimoto,et al.  Mining optimized association rules for numeric attributes , 1996, J. Comput. Syst. Sci..