Fast and Efficient Strategies for Model Selection of Gaussian Support Vector Machine

Two strategies for selecting the kernel parameter (sigma) and the penalty coefficient (C) of Gaussian support vector machines (SVMs) are suggested in this paper. Based on viewing the model parameter selection problem as a recognition problem in visual systems, a direct parameter setting formula for the kernel parameter is derived through finding a visual scale at which the global and local structures of the given data set can be preserved in the feature space, and the difference between the two structures can be maximized. In addition, we propose a heuristic algorithm for the selection of the penalty coefficient through identifying the classification extent of a training datum in the implementation process of the sequential minimal optimization (SMO) procedure, which is a well-developed and commonly used algorithm in SVM training. We then evaluate the suggested strategies with a series of experiments on 13 benchmark problems and three real-world data sets, as compared with the traditional 5-cross validation (5-CV) method and the recently developed radius-margin bound (RM) method. The evaluation shows that in terms of efficiency and generalization capabilities, the new strategies outperform the current methods, and the performance is uniform and stable.

[1]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[2]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[3]  Shuang Liu,et al.  A new weighted support vector machine with GA-based parameter selection , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[4]  Bernhard Schölkopf,et al.  Introduction to support vector learning , 1999 .

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Yee Leung,et al.  Clustering by Scale-Space Filtering , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[9]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[10]  Chih-Jen Lin,et al.  Radius Margin Bounds for Support Vector Machines with the RBF Kernel , 2002, Neural Computation.

[11]  C. H. Goulden,et al.  Methods of Statistical Analysis , 1939 .

[12]  Chih-Jen Lin,et al.  IJCNN 2001 challenge: generalization ability and text decoding , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[13]  Jing Hu,et al.  Classification model selection via bilevel programming , 2008, Optim. Methods Softw..

[14]  Michiel Debruyne,et al.  Robustness of censored depth quantiles, PCA and kernel based regression with new tools for model selection , 2007 .

[15]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[16]  李翔,et al.  A new support vector machine optimized by improved particle swarm optimization and its application , 2006 .

[17]  M. Pontil Leave-one-out error and stability of learning algorithms with applications , 2002 .

[18]  Michael S. Schmidt,et al.  Identifying Speakers With Support Vector Networks , 1996 .

[19]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[20]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[21]  M. Debruyne,et al.  Model Selection in Kernel Based Regression using the Influence Function , 2008 .

[22]  Olivier Chapelle,et al.  Model Selection for Support Vector Machines , 1999, NIPS.

[23]  Jing Hu,et al.  Model Selection via Bilevel Optimization , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[24]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[25]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[26]  Cheng-Lung Huang,et al.  A distributed PSO-SVM hybrid system with feature selection and parameter optimization , 2008, Appl. Soft Comput..

[27]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[28]  Zne-Jung Lee,et al.  Parameter determination of support vector machine and feature selection using simulated annealing approach , 2008, Appl. Soft Comput..

[29]  Wenjian Wang,et al.  Determination of the spread parameter in the Gaussian kernel for classification and regression , 2003, Neurocomputing.

[30]  Wei-Chiang Hong,et al.  Determining Parameters of Support Vector Machines by Genetic AlgorithmsApplications to Reliability Prediction , 2006 .

[31]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[32]  Bernhard Schölkopf,et al.  GACV for Support Vector Machines , 2000 .

[33]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[34]  Thorsten Joachims,et al.  The Maximum-Margin Approach to Learning Text Classifiers , 2001, Künstliche Intell..

[35]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[36]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[37]  Yang Jing L1 Regularization Path Algorithm for Generalized Linear Models , 2008 .

[38]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[39]  Ching Y. Suen,et al.  Optimization of the SVM Kernels Using an Empirical Error Minimization Scheme , 2002, SVM.

[40]  Dengcai Gong,et al.  Support Vector Machines with PSO Algorithm for Short-Term Load Forecasting , 2006, 2006 IEEE International Conference on Networking, Sensing and Control.

[41]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[42]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[43]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[44]  S. Sathiya Keerthi,et al.  Efficient tuning of SVM hyperparameters using radius/margin bound and iterative algorithms , 2002, IEEE Trans. Neural Networks.

[45]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[46]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[47]  Hansong Zhang,et al.  Gacv for support vector machines , 2000 .

[48]  Jing Hu,et al.  Bilevel Model Selection for Support Vector Machines , 2007 .

[49]  Nello Cristianini,et al.  Dynamically Adapting Kernels in Support Vector Machines , 1998, NIPS.

[50]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[51]  Michael C. Ferris,et al.  Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[52]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.