Ensembles of classifiers based on rough sets theory and set-oriented database operations

In this paper we present a new approach to construct a good ensemble of classifiers for data mining applications based on rough set theory and database set operations. We borrow the main ideas of rough set theory and redefine them based on the database theory to take advantage of the very efficient set-oriented database operation. Our method first computes a set of reducts which include all the necessary attributes required for the decision categories. For each reduct, a reduct table is generated by removing those attributes which are not in the reduct. Next a novel rule induction algorithm is used to compute the maximal generalized rules for each reduct table and a set of reduct classifiers is formed based on the corresponding reducts. Our rule induction algorithm adopts the "conquer-without-separating " strategy to generate a set of global best rules from the data set. The experimental results indicates that the rough set based approach is very promising for ensemble of classifiers.

[1]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[2]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[5]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[6]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[7]  Tsau Young Lin,et al.  Rough Sets and Data Mining: Analysis of Imprecise Data , 1996 .

[8]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[9]  L. Zadeh,et al.  Data mining, rough sets and granular computing , 2002 .

[10]  Chuan Long,et al.  Boosting Noisy Data , 2001, ICML.

[11]  Vipin Kumar,et al.  Predicting rare classes: can boosting make any weak learner strong? , 2002, KDD.

[12]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[13]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[14]  Ron Rymon An SE-tree based Characterization of the Induction Problem , 1993, ICML.

[15]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[16]  Wojciech Ziarko,et al.  Variable Precision Rough Set Model , 1993, J. Comput. Syst. Sci..