Hyperparameter design criteria for support vector classifiers

Abstract The design of a support vector machine (SVM) consists in tuning a set of hyperparameter quantities, and requires an accurate prediction of the classifier's generalization performance. The paper describes the application of the maximal-discrepancy criterion to the hyperparameter-setting process, and points out the advantages of such an approach over existing theoretical frameworks. The resulting theoretical predictions are then compared with the k -fold cross-validation empirical method, which probably is the current best-performing approach to the SVM design problem. Experimental results on a wide range of real-world testbeds prove out that the features of the maximal-discrepancy method can notably narrow the gap that so far has separated theoretical and empirical estimates of a classifier's generalization error.

[1]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[2]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[3]  Sandro Ridella,et al.  Using K-Winner Machines for domain analysis , 2004, Neurocomputing.

[4]  R. Fletcher Practical Methods of Optimization , 1988 .

[5]  Thorsten Joachims,et al.  The Maximum-Margin Approach to Learning Text Classifiers , 2001, Künstliche Intell..

[6]  Sandro Ridella,et al.  K-winner machines for pattern classification , 2001, IEEE Trans. Neural Networks.

[7]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  Don R. Hush,et al.  Machine Learning with Data Dependent Hypothesis Classes , 2002, J. Mach. Learn. Res..

[12]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[13]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[14]  E. Polak Introduction to linear and nonlinear programming , 1973 .

[15]  Sandro Ridella,et al.  Circular backpropagation networks embed vector quantization , 1999, IEEE Trans. Neural Networks.

[16]  Yann LeCun,et al.  Measuring the VC-Dimension of a Learning Machine , 1994, Neural Computation.

[17]  Manfred K. Warmuth,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[18]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[19]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[20]  William Li,et al.  Measuring the VC-Dimension Using Optimized Experimental Design , 2000, Neural Computation.

[21]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[22]  Joseph Sill Monotonicity and connectedness in learning systems , 1998 .