Determination of hyper-parameters for kernel based classification and regression

The optimization of the hyper-parameters of a statistical procedure or machine learning task is a crucial step for obtaining a minimal error. Unfortunately, the optimization of hyper-parameters usually requires many runs of the procedure and hence is very costly. A more detailed knowledge of the dependency of the performance of a procedure on its hyper-parameters can help to speed up this process. In this paper, we investigate the case of kernel-based classifiers and regression estimators which belong to the class of convex risk minimization methods from machine learning. In an empirical investigation, the response surfaces of nonlinear support vector machines and kernel logistic regression are analyzed and the performance of several algorithms for determining hyper-parameters is investigated. The rest of the paper is organized as follows: Section 2 briefly outlines kernel based classification and regression methods. Section 3 gives details on several methods for optimizing the hyper-parameters of statistical procedures. Then, some numerical examples are presented in Section 4. Section 5 contains a discussion. Finally, all figures are given in the appendix.

[1]  Ingo Steinwart,et al.  On the Influence of the Kernel on the Consistency of Support Vector Machines , 2002, J. Mach. Learn. Res..

[2]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[3]  S. Sathiya Keerthi,et al.  A Fast Dual Algorithm for Kernel Logistic Regression , 2002, 2007 International Joint Conference on Neural Networks.

[4]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[5]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[6]  Hans Ulrich Simon,et al.  Robust Trainability of Single Neurons , 1995, J. Comput. Syst. Sci..

[7]  Simon Haykin,et al.  Support vector machines for dynamic reconstruction of a chaotic system , 1999 .

[8]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[9]  Ambuj Tewari,et al.  Sparseness vs Estimating Conditional Probabilities: Some Asymptotic Results , 2007, J. Mach. Learn. Res..

[10]  Katharina Morik,et al.  Knowledge discovery and knowledge validation in intensive care , 2000, Artif. Intell. Medicine.

[11]  Ingo Steinwart,et al.  Consistency and robustness of kernel based regression , 2005 .

[12]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[13]  Kristin P. Bennett,et al.  A Pattern Search Method for Model Selection of Support Vector Regression , 2002, SDM.

[14]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[15]  Andreas Christmann,et al.  On Robustness Properties of Convex Risk Minimization Methods for Pattern Recognition , 2004, J. Mach. Learn. Res..

[16]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[17]  Yunqian Ma,et al.  Practical selection of SVM parameters and noise estimation for SVM regression , 2004, Neural Networks.

[18]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[19]  Andreas Christmann,et al.  An approach to model complex high–dimensional insurance data , 2004 .