Bayesian Support Vector Machines for Feature Ranking and Selection

In this chapter, we develop and evaluate a feature selection algorithm for Bayesian support vector machines. The relevance level of features are represented by ARD (automatic relevance determination) parameters, which are optimized by maximizing the model evidence in the Bayesian framework. The features are ranked in descending order using the optimal ARD values, and then forward selection is carried out to determine the minimal set of relevant features. In the numerical experiments, our approach using ARD for feature ranking can achieve a more compact feature set than standard ranking techniques, along with better generalization performance.

[1]  Radford M. Neal Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification , 1997, physics/9701026.

[2]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[3]  David J. C. MacKay,et al.  Bayesian Methods for Backpropagation Networks , 1996 .

[4]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[5]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[6]  Tommi S. Jaakkola,et al.  Feature Selection and Dualities in Maximum Entropy Discrimination , 2000, UAI.

[7]  Wei Chu,et al.  Bayesian support vector regression using a unified loss function , 2004, IEEE Transactions on Neural Networks.

[8]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[9]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[10]  Ole Winther,et al.  Gaussian Processes for Classification: Mean-Field Algorithms , 2000, Neural Computation.

[11]  Wei Chu,et al.  Bayesian Trigonometric Support Vector Classifier , 2003, Neural Computation.

[12]  Carl E. Rasmussen,et al.  In Advances in Neural Information Processing Systems , 2011 .

[13]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[14]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[15]  R. Fletcher Practical Methods of Optimization , 1988 .

[16]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[17]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[18]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[19]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[20]  Zoubin Ghahramani,et al.  The EM-EP algorithm for Gaussian process classification , 2003 .

[21]  G. Wahba Spline models for observational data , 1990 .

[22]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[23]  Tomaso Poggio,et al.  A Unified Framework for Regularization Networks and Support Vector Machines , 1999 .

[24]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[25]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[27]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[28]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[31]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .

[32]  Ole Winther,et al.  Efficient Approaches to Gaussian Process Classification , 1999, NIPS.

[33]  Matthias W. Seeger,et al.  Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers , 1999, NIPS.