论文信息 - Support Vector Machines for Knowledge Discovery

Support Vector Machines for Knowledge Discovery

In this paper, we apply support vector machine (SVM) to knowledge discovery (KD) and confirm its effectiveness with a benchmark data set. SVM has been successfully applied to problems in various domains. However, its effectiveness as a KD method is unknown. We propose SVM for KD, which deals with a classification problem with a binary class, by rescaling each attribute based on z-scores. SVM for KD can sort attributes with respect to their effectiveness in discriminating classes. Moreover, SVM for KD can discover crucial examples for discrimination. We settled six discovery tasks with the meningoencephalitis data set, which is a benchmark data set in KD. A domain expert ranked the discovery outcomes of SVM for KD from one to five with respect to several criteria. Selected attributes in six tasks are all valid and useful: their average scores are 3.8-4.0. Discovering order of attributes about usefulness represents a challenging problem. However, concerning this problem, our method achieved a score of more than or equal to 4.0 in three tasks. Besides, crucial examples for discrimination and typical examples for each class agree with medical knowledge. These promising results demonstrate the effectiveness of our approach.

Shusaku Tsumoto | Einoshin Suzuki | Shinsuke Sugaya

[1] Peter J. Rousseeuw,et al. Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[2] Einoshin Suzuki,et al. Autonomous Discovery of Reliable Exception Rules , 1997, KDD.

[3] Massimiliano Pontil,et al. Support Vector Machines for 3D Object Recognition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[5] Raymond T. Ng,et al. Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[6] Shusaku Tsumoto. Knowledge Discovery in Clinical Databases: An Experiment with Rule Induction and Statistics , 1999, ISMIS.

[7] Federico Girosi,et al. Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.