Dynamic integration of multiple data mining techniques in a knowledge discovery management system

One of the most important directions in improvement of data mining and knowledge discovery, is the integration of multiple classification techniques of an ensemble of classifiers. An integration technique should be able to estimate and select the most appropriate component classifiers from the ensemble. We present two variations of an advanced dynamic integration technique with two distance metrics. The technique is one variation of the stacked generalization method, with an assumption that each of the component classifiers is the best one, inside a certain sub area of the entire domain area. Our technique includes two phases: the learning phase and the application phase. During the learning phase, a performance matrix of each component classifier is derived, using the instances of the training set. Each matrix thus includes a way information concerning the 'competence area' of the corresponding component classifier. These matrixes are used during the application phase to predict the performance of each component classifier in each new instance. The technique is evaluated on three data sets, taken from the UCI machine learning repository, with which well-known classification methods have not proved successful. The comparison results show that our dynamic integration technique outperforms weighted voting and cross-validation majority techniques in some datasets.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[3]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[4]  G DietterichThomas Approximate statistical tests for comparing supervised classification learning algorithms , 1998 .

[5]  Alexey Tsymbal,et al.  Advanced dynamic selection of diagnostic methods , 1998, Proceedings. 11th IEEE Symposium on Computer-Based Medical Systems (Cat. No.98CB36237).

[6]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[7]  J MerzChristopher Using Correspondence Analysis to Combine Classifiers , 1999 .

[8]  Alexey Tsymbal,et al.  The decision support system for telemedicine based on multiple expertise , 1998, Int. J. Medical Informatics.

[9]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[10]  Alexander Schnabl,et al.  Development of Multi-Criteria Metrics for Evaluation of Data Mining Algorithms , 1997, KDD.

[11]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[12]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[13]  Shlomo Argamon,et al.  Arbitrating Among Competing Classifiers Using Learned Referees , 2001, Knowledge and Information Systems.

[14]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[15]  Christopher J. Merz,et al.  Dynamical Selection of Learning Algorithms , 1995, AISTATS.

[16]  Moshe Koppel Sean P. Engelson Integrating Multiple Classifiers By Finding Their Areas of Expertise , 1996 .