Analysis of the Distance Between Two Classes for Tuning SVM Hyperparameters

An important step in the construction of a support vector machine (SVM) is to select optimal hyperparameters. This paper proposes a novel method for tuning the hyperparameters by maximizing the distance between two classes (DBTC) in the feature space. With a normalized kernel function, we find that DBTC can be used as a class separability criterion since the between-class separation and the within-class data distribution are implicitly taken into account. Employing DBTC as an objective function, we develop a gradient-based algorithm to search the optimal kernel parameter. On the basis of the geometric analysis and simulation results, we find that the optimal algorithm and the initialization problem become very simple. Experimental results on the synthetic and real-world data show that the proposed method consistently outperforms other existing hyperparameter tuning methods.

[1]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[2]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[3]  Christopher J. C. Burges,et al.  Geometry and invariance in kernel based methods , 1999 .

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[6]  Robert P. W. Duin,et al.  Expected classification error of the Fisher linear classifier with pseudo-inverse covariance matrix , 1998, Pattern Recognit. Lett..

[7]  J. Friedman Regularized Discriminant Analysis , 1989 .

[8]  R. Debnath,et al.  An efficient method for tuning kernel parameter of the support vector machine , 2004, IEEE International Symposium on Communications and Information Technology, 2004. ISCIT 2004..

[9]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[10]  Yiming Ying,et al.  Learnability of Gaussians with Flexible Variances , 2007, J. Mach. Learn. Res..

[11]  Trevor Hastie,et al.  Regularized Discriminant Analysis and Its Application in Microarrays , 2004 .

[12]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[13]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[14]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[15]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[16]  S. Sathiya Keerthi,et al.  Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms , 2002, IEEE Trans. Neural Networks.

[17]  Daniel S. Yeung,et al.  Weighted Mahalanobis Distance Kernels for Support Vector Machines , 2007, IEEE Transactions on Neural Networks.

[18]  Charles A. Micchelli,et al.  A DC-programming algorithm for kernel selection , 2006, ICML.

[19]  M. Omair Ahmad,et al.  Optimizing the kernel in the empirical feature space , 2005, IEEE Transactions on Neural Networks.

[20]  Daniel S. Yeung,et al.  Structured large margin machines: sensitive to data distributions , 2007, Machine Learning.

[21]  Carl Gold,et al.  Bayesian approach to feature selection and parameter tuning for support vector machine classifiers , 2005, Neural Networks.

[22]  Wenjian Wang,et al.  Determination of the spread parameter in the Gaussian kernel for classification and regression , 2003, Neurocomputing.

[23]  Chih-Jen Lin,et al.  Radius Margin Bounds for Support Vector Machines with the RBF Kernel , 2002, Neural Computation.

[24]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[25]  William Stafford Noble,et al.  Nonstationary kernel combination , 2006, ICML.

[26]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[27]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[28]  Peter Williams,et al.  A Geometrical Method to Improve Performance of the Support Vector Machine , 2007, IEEE Transactions on Neural Networks.

[29]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[30]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[31]  Gang Wang,et al.  A kernel path algorithm for support vector machines , 2007, ICML '07.

[32]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[33]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[34]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[35]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[36]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[37]  Hong Chang,et al.  Learning the kernel matrix by maximizing a KFD-based class separability criterion , 2007, Pattern Recognit..

[38]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[39]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[40]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[41]  Qi Tian,et al.  Image Classification By The Foley-Sammon Transform , 1986 .

[42]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[43]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[44]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[45]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[46]  Sergios Theodoridis,et al.  Pattern Recognition, Fourth Edition , 2008 .

[47]  W. V. McCarthy,et al.  Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data , 1995 .

[48]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[49]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .