Maximum Likelihood Model Selection for 1-Norm Soft Margin SVMs with Multiple Parameters

Adapting the hyperparameters of support vector machines (SVMs) is a challenging model selection problem, especially when flexible kernels are to be adapted and data are scarce. We present a coherent framework for regularized model selection of 1-norm soft margin SVMs for binary classification. It is proposed to use gradient-ascent on a likelihood function of the hyperparameters. The likelihood function is based on logistic regression for robustly estimating the class conditional probabilities and can be computed efficiently. Overfitting is an important issue in SVM model selection and can be addressed in our framework by incorporating suitable prior distributions over the hyperparameters. We show empirically that gradient-based optimization of the likelihood function is able to adapt multiple kernel parameters and leads to better models than four concurrent state-of-the-art methods.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[3]  Gavin C. Cawley,et al.  Preventing Over-Fitting during Model Selection via Bayesian Regularisation of the Hyper-Parameters , 2007, J. Mach. Learn. Res..

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  Ambuj Tewari,et al.  Sparseness vs Estimating Conditional Probabilities: Some Asymptotic Results , 2007, J. Mach. Learn. Res..

[6]  T. Glasmachers,et al.  Gradient-Based Optimization of Kernel-Target Alignment for Sequence Kernels Applied to Bacterial Gene Start Detection , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[8]  Christian Igel,et al.  Empirical evaluation of the improved Rprop learning algorithms , 2003, Neurocomputing.

[9]  Chih-Jen Lin,et al.  Radius Margin Bounds for Support Vector Machines with the RBF Kernel , 2002, Neural Computation.

[10]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[11]  G. Cawley,et al.  Efficient approximate leave-one-out cross-validation for kernel logistic regression , 2008, Machine Learning.

[12]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[13]  Christian Igel,et al.  Evolutionary tuning of multiple SVM parameters , 2005, ESANN.

[14]  Christian Igel,et al.  Gradient-Based Adaptation of General Gaussian Kernels , 2005, Neural Computation.

[15]  Christian Igel,et al.  Uncertainty Handling in Model Selection for Support Vector Machines , 2008, PPSN.

[16]  Olivier Chapelle,et al.  Support Vector Machines: Induction Principle, Adaptive Tuning and Prior Knowledge , 2002 .

[17]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[18]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[19]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[20]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[21]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[22]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[23]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[24]  Christian Igel,et al.  Maximum-Gain Working Set Selection for SVMs , 2006, J. Mach. Learn. Res..

[25]  S. Sathiya Keerthi,et al.  An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models , 2006, NIPS.

[26]  Carl Gold,et al.  Model selection for support vector machine classification , 2002, Neurocomputing.

[27]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[28]  Christian Igel,et al.  Efficient covariance matrix update for variable metric evolution strategies , 2009, Machine Learning.

[29]  David Haussler,et al.  Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[30]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[31]  Matthias W. Seeger,et al.  Bayesian Model Selection for Support Vector Machines, Gaussian Processes and Other Kernel Classifiers , 1999, NIPS.

[32]  Grace Wahba,et al.  Soft and hard classification by reproducing kernel Hilbert space methods , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[33]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[34]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[35]  S. Sathiya Keerthi,et al.  Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms , 2002, IEEE Trans. Neural Networks.