Novel multi-class feature selection methods using sensitivity analysis of posterior probabilities

Novel feature-selection methods are proposed for multi-class support-vector-machine (SVM) learning. They are based on two new feature-ranking criteria. Both criteria, collectively termed multi-class feature-based sensitivity of posterior probabilities (MFSPP), evaluate the importance of a feature by computing the aggregate value, over the feature space, of the absolute difference of the probabilistic outputs of the multi-class SVM with and without the feature. In their original form, the criteria are computationally expensive and three approximations, MFSPP1-MFSPP3, are then proposed. In a carefully controlled experimental study, all these three approximations are tested on various artificial and benchmark datasets. Results show that they outperform the multi-class versions of support-vector-machine recursive feature-elimination method (SVM-RFE) and other standard filtering methods, with one of the three proposed approximations having a slight edge over the other two. Based on the experiments, the advantage of the proposed methods is particularly significant when training dataset is sparse.

[1]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[2]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[3]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[6]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[7]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[8]  Chong Jin Ong,et al.  Feature selection via sensitivity analysis of SVM probabilistic outputs , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[9]  Wei Chu,et al.  Multi-category Classification by Soft-Max Combination of Binary Classifiers , 2003, Multiple Classifier Systems.

[10]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[11]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[12]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[13]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[14]  S. Sathiya Keerthi,et al.  Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[15]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[16]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[19]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[20]  Volker Roth,et al.  Probabilistic Discriminative Kernel Classifiers for Multi-class Problems , 2001, DAGM-Symposium.

[21]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[22]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[23]  E. S. Page Errata: A Note on Generating Random Permutations , 1968 .