Feature Weighting Using Margin and Radius Based Error Bound Optimization in SVMs

The Support Vector Machine error bound is a function of the margin and radius. Standard SVM algorithms maximize the margin within a given feature space, therefore the radius is fixed and thus ignored in the optimization. We propose an extension of the standard SVM optimization in which we also account for the radius in order to produce an even tighter error bound than what we get by controlling only for the margin. We use a second set of parameters, μ , that control the radius introducing like that an explicit feature weighting mechanism in the SVM algorithm. We impose an l 1 constraint on μ which results in a sparse vector, thus performing feature selection. Our original formulation is not convex, we give a convex approximation and show how to solve it. We experiment with real world datasets and report very good predictive performance compared to standard SVM.

[1]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[2]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[3]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[4]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[5]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[6]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[7]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  N. Maculan,et al.  Global optimization : from theory to implementation , 2006 .

[10]  Alexander Shapiro,et al.  Optimization Problems with Perturbations: A Guided Tour , 1998, SIAM Rev..

[11]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[12]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[13]  Adrian Smith,et al.  Bayesian Assessment of Network Reliability , 1998, SIAM Rev..

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[15]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[16]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[19]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .