Nonlinear Support Vector Machine Visualization for Risk Factor Analysis Using Nomograms and Localized Radial Basis Function Kernels

Nonlinear classifiers, e.g., support vector machines (SVMs) with radial basis function (RBF) kernels, have been used widely for automatic diagnosis of diseases because of their high accuracies. However, it is difficult to visualize the classifiers, and thus difficult to provide intuitive interpretation of results to physicians. We developed a new nonlinear kernel, the localized radial basis function (LRBF) kernel, and new visualization system visualization for risk factor analysis (VRIFA) that applies a nomogram and LRBF kernel to visualize the results of nonlinear SVMs and improve the interpretability of results while maintaining high prediction accuracy. Three representative medical datasets from the University of California, Irvine repository and Statlog dataset-breast cancer, diabetes, and heart disease datasets-were used to evaluate the system. The results showed that the classification performance of the LRBF is comparable with that of the RBF, and the LRBF is easy to visualize via a nomogram. Our study also showed that the LRBF kernel is less sensitive to noise features than the RBF kernel, whereas the LRBF kernel degrades the prediction accuracy more when important features are eliminated. We demonstrated the VRIFA system, which visualizes the results of linear and nonlinear SVMs with LRBF kernels, on the three datasets.

[1]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[2]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[3]  David A. Yuen,et al.  Pattern recognition techniques for automatic detection of suspicious-looking anomalies in mammograms , 2005, Comput. Methods Programs Biomed..

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[6]  Ivan Bratko,et al.  Nomograms for visualizing support vector machines , 2005, KDD '05.

[7]  Robert H. Kewley,et al.  Data Mining for Molecules with 2-D Neural Network Sensitivity Analysis , 2003 .

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[10]  Jung Hun Oh,et al.  Multicategory Classification using Extended SVM-RFE and Markov Blanket on SELDI-TOF Mass Spectrometry Data , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Bernard Widrow,et al.  Sensitivity of feedforward neural networks to weight errors , 1990, IEEE Trans. Neural Networks.

[13]  Nikos Dimitropoulos,et al.  Mammographic masses characterization based on localized texture and dataset fractal analysis using linear, neural and support vector machine classifiers , 2006, Artif. Intell. Medicine.

[14]  Antoine Geissbühler,et al.  Learning from imbalanced data in surveillance of nosocomial infection , 2006, Artif. Intell. Medicine.

[15]  Igor Kononenko,et al.  Machine learning in prognosis of the femoral neck fracture recovery , 1996, Artif. Intell. Medicine.

[16]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[17]  Nelson G. Durdle,et al.  A support vector machines classifier to assess the severity of idiopathic scoliosis from surface topography , 2006, IEEE Transactions on Information Technology in Biomedicine.

[18]  D. Feng,et al.  IEEE transactions on information technology in biomedicine: special issue on advances in clinical and health-care knowledge management , 2005 .

[19]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .