Support vector machines and strictly positive definite kernel: The regularization hyperparameter is more important than the kernel hyperparameters

When dealing with a Support Vector Machine (SVM) with a strictly positive definite kernel, a common misconception is that the main handle for controlling the nonlinearity of the classification surface is the set of kernel hyperparameters. We show here that this is not the case: in particular, we prove that, regardless of the value of the kernel hyperparameter, it is always possible to tune the nonlinearity of the classifier by acting only on the regularization hyperparameter C, even achieving perfect learning of any non-degenerate training set.

[1]  James E. Breneman Kernel Methods for Pattern Analysis , 2005, Technometrics.

[2]  Sanjeev R. Kulkarni,et al.  Learning Pattern Classification - A Survey , 1998, IEEE Trans. Inf. Theory.

[3]  Davide Anguita,et al.  A Learning Machine with a Bit-Based Hypothesis Space , 2013, ESANN.

[4]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[5]  Davide Anguita,et al.  Maximal Discrepancy vs. Rademacher Complexity for error estimation , 2011, ESANN.

[6]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[7]  Davide Anguita,et al.  Selecting the hypothesis space for improving the generalization ability of Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[8]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[9]  Jing Peng,et al.  SVM vs regularized least squares classification , 2004, ICPR 2004.

[10]  Davide Anguita,et al.  In-sample Model Selection for Trimmed Hinge Loss Support Vector Machine , 2012, Neural Processing Letters.

[11]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[12]  Juan-Manuel Torres-Moreno,et al.  Characterization of the Sonar Signals Benchmark , 1998, Neural Processing Letters.

[13]  Marcos M. Campos,et al.  SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines , 2005, VLDB.

[14]  José Barahona da Fonseca Are Rosenblatt multilayer perceptrons more powerfull than sigmoidal multilayer perceptrons? From a counter example to a general result , 2013, ESANN.

[15]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[16]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[17]  Chih-Jen Lin,et al.  Training v-Support Vector Classifiers: Theory and Algorithms , 2001, Neural Computation.

[18]  Davide Anguita,et al.  Feed-Forward Support Vector Machine Without Multipliers , 2006, IEEE Transactions on Neural Networks.

[19]  Davide Anguita,et al.  In-sample model selection for Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[20]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[21]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[22]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.