论文信息 - Novel multi-class feature selection methods using sensitivity analysis of posterior probabilities

Novel multi-class feature selection methods using sensitivity analysis of posterior probabilities

Novel feature-selection methods are proposed for multi-class support-vector-machine (SVM) learning. They are based on two new feature-ranking criteria. Both criteria, collectively termed multi-class feature-based sensitivity of posterior probabilities (MFSPP), evaluate the importance of a feature by computing the aggregate value, over the feature space, of the absolute difference of the probabilistic outputs of the multi-class SVM with and without the feature. In their original form, the criteria are computationally expensive and three approximations, MFSPP1-MFSPP3, are then proposed. In a carefully controlled experimental study, all these three approximations are tested on various artificial and benchmark datasets. Results show that they outperform the multi-class versions of support-vector-machine recursive feature-elimination method (SVM-RFE) and other standard filtering methods, with one of the three proposed approximations having a slight edge over the other two. Based on the experiments, the advantage of the proposed methods is particularly significant when training dataset is sparse.

[1] Hsuan-Tien Lin,et al. A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[2] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[3] Jason Weston,et al. Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[4] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[5] Bernhard Schölkopf,et al. Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[6] Chih-Jen Lin,et al. Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[7] Gunnar Rätsch,et al. Soft Margins for AdaBoost , 2001, Machine Learning.

[8] Chong Jin Ong,et al. Feature selection via sensitivity analysis of SVM probabilistic outputs , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[9] Wei Chu,et al. Multi-category Classification by Soft-Max Combination of Binary Classifiers , 2003, Multiple Classifier Systems.

[10] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[11] Ryan M. Rifkin,et al. In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[12] Chih-Jen Lin,et al. A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[13] David J. Spiegelhalter,et al. Machine Learning, Neural and Statistical Classification , 2009 .

[14] S. Sathiya Keerthi,et al. Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[15] Robert Tibshirani,et al. Classification by Pairwise Coupling , 1997, NIPS.

[16] Sayan Mukherjee,et al. Feature Selection for SVMs , 2000, NIPS.

[17] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[18] Alain Rakotomamonjy,et al. Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[19] D. J. Newman,et al. UCI Repository of Machine Learning Database , 1998 .

[20] Volker Roth,et al. Probabilistic Discriminative Kernel Classifiers for Multi-class Problems , 2001, DAGM-Symposium.

[21] Christopher J. Merz,et al. UCI Repository of Machine Learning Databases , 1996 .

[22] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[23] E. S. Page. Errata: A Note on Generating Random Permutations , 1968 .