论文信息 - Hyperparameter design criteria for support vector classifiers

Hyperparameter design criteria for support vector classifiers

Abstract The design of a support vector machine (SVM) consists in tuning a set of hyperparameter quantities, and requires an accurate prediction of the classifier's generalization performance. The paper describes the application of the maximal-discrepancy criterion to the hyperparameter-setting process, and points out the advantages of such an approach over existing theoretical frameworks. The resulting theoretical predictions are then compared with the k -fold cross-validation empirical method, which probably is the current best-performing approach to the SVM design problem. Experimental results on a wide range of real-world testbeds prove out that the features of the maximal-discrepancy method can notably narrow the gap that so far has separated theoretical and empirical estimates of a classifier's generalization error.

Davide Anguita | Sandro Ridella | Rodolfo Zunino | Fabio Rivieccio

[1] Robert Tibshirani,et al. An Introduction to the Bootstrap , 1994 .

[2] S. Sathiya Keerthi,et al. Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[3] Sandro Ridella,et al. Using K-Winner Machines for domain analysis , 2004, Neurocomputing.

[4] R. Fletcher. Practical Methods of Optimization , 1988 .

[5] Thorsten Joachims,et al. The Maximum-Margin Approach to Learning Text Classifiers , 2001, Künstliche Intell..

[6] Sandro Ridella,et al. K-winner machines for pattern classification , 2001, IEEE Trans. Neural Networks.

[7] Peter L. Bartlett,et al. Model Selection and Error Estimation , 2000, Machine Learning.

[9] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[10] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[11] Don R. Hush,et al. Machine Learning with Data Dependent Hypothesis Classes , 2002, J. Mach. Learn. Res..

[12] Richard S. Johannes,et al. Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[13] Peter L. Bartlett,et al. The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[14] E. Polak. Introduction to linear and nonlinear programming , 1973 .

[15] Sandro Ridella,et al. Circular backpropagation networks embed vector quantization , 1999, IEEE Trans. Neural Networks.

[16] Yann LeCun,et al. Measuring the VC-Dimension of a Learning Machine , 1994, Neural Computation.

[17] Manfred K. Warmuth,et al. Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[18] Sayan Mukherjee,et al. Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[19] O. Mangasarian,et al. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[20] William Li,et al. Measuring the VC-Dimension Using Optimized Experimental Design , 2000, Neural Computation.

[21] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[22] Joseph Sill. Monotonicity and connectedness in learning systems , 1998 .