Multidimensional splines with infinite number of knots as SVM kernels

Radial basis function (RBF) kernels for SVM have been routinely used in a wide range of classification problems, delivering consistently good performance for those problems where the kernel computations are numerically feasible (high-dimensional problems typically use linear kernels). One of the drawbacks of RBF kernels is the necessity of selecting the proper value of the hyperparameter γ in addition to the standard SVM penalty parameter C - this process can lead to overfitting. Another (more obscure) drawback of RBF is its inherent non-optimality as an approximation function. In order to address these issues, we propose to extend the concept of polynomial splines (designed explicitly for approximation purposes) to multidimensional normalized splines with infinite number of knots and use the resulting hyperparameter-free kernel SVMs instead of RBF kernel SVMs. We tested our approach for a number of standard classification datasets used in the literature. The results suggest that new kernels deliver mostly better classification performance than RBF kernel (for problems of moderately large dimensions), but allow faster computation (if measured on large cross-validation grids), with less chance of overfitting.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  W. Rudin Principles of mathematical analysis , 1964 .

[5]  C. D. Boor,et al.  What is a multivariate spline ? , 2008 .

[6]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[7]  Thore Graepel,et al.  A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work , 2000, NIPS.

[8]  Qiang Shen,et al.  Computational Intelligence and Feature Selection - Rough and Fuzzy Approaches , 2008, IEEE Press series on computational intelligence.

[9]  Vladimir Vapnik,et al.  A new learning paradigm: Learning using privileged information , 2009, Neural Networks.

[10]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[11]  Rauf Izmailov,et al.  SMO-Style Algorithms for Learning Using Privileged Information , 2010, DMIN.

[12]  Alexander J. Smola,et al.  Classification in a normalized feature space using support vector machines , 2003, IEEE Trans. Neural Networks.

[13]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[14]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.