Theoretical and Practical Model Selection Methods for Support Vector Classifiers

In this chapter, we revise several methods for SVM model selection, deriving from different approaches: some of them build on practical lines of reasoning but are not fully justified by a theoretical point of view; on the other hand, some methods rely on rigorous theoretical work but are of little help when applied to real–world problems, because the underlying hypotheses cannot be verified or the result of their application is uninformative. Our objective is to sketch some light on these issues by carefully analyze the most well–known methods and test some of them on standard benchmarks to evaluate their effectiveness.

[1]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Nello Cristianini,et al.  Margin Distribution and Soft Margin , 2000 .

[4]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[5]  John Shawe-Taylor,et al.  Generalisation Error Bounds for Sparse Linear Classifiers , 2000, COLT.

[6]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[7]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[8]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[9]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[10]  Martin Anthony,et al.  Cross-validation for binary classification by real-valued functions: theoretical analysis , 1998, COLT' 98.

[11]  Alexander J. Smola,et al.  Classification in a normalized feature space using support vector machines , 2003, IEEE Trans. Neural Networks.

[12]  Hubertus Franke,et al.  An Efficient Implementation of MPI , 1994 .

[13]  John Langford,et al.  Quantitatively tight sample complexity bounds , 2002 .

[14]  Chih-Jen Lin,et al.  A tutorial on?-support vector machines , 2005 .

[15]  Chih-Jen Lin,et al.  Asymptotic convergence of an SMO algorithm without any assumptions , 2002, IEEE Trans. Neural Networks.

[16]  Luc Devroye,et al.  Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.

[17]  David G. Luenberger,et al.  Linear and Nonlinear Programming: Second Edition , 2003 .

[18]  Davide Anguita,et al.  The ISAAC server: a proposal for smart algorithms delivering , 2003 .

[19]  Davide Anguita,et al.  An efficient implementation of BP on RISC-based workstations , 1994, Neurocomputing.

[20]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[21]  M. J. Wichura The percentage points of the normal distribution , 1988 .

[22]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[23]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[24]  Adam Tauman Kalai,et al.  Probabilistic and on-line methods in machine learning , 2001 .

[25]  Katharina Morik,et al.  Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring , 1999, ICML.

[26]  Davide Anguita,et al.  Evaluating the Generalization Ability of Support Vector Machines through the Bootstrap , 2000, Neural Processing Letters.

[27]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[28]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[29]  Dimitris N. Politis,et al.  Computer-intensive methods in statistical analysis , 1998, IEEE Signal Process. Mag..

[30]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[31]  Ralf Herbrich,et al.  Learning Kernel Classifiers , 2001 .

[32]  Davide Anguita,et al.  Hyperparameter design criteria for support vector classifiers , 2003, Neurocomputing.

[33]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[34]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[35]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[36]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[37]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[38]  W. Rogers,et al.  A Finite Sample Distribution-Free Performance Bound for Local Discrimination Rules , 1978 .

[39]  Luc Devroye,et al.  Distribution-free inequalities for the deleted and holdout error estimates , 1979, IEEE Trans. Inf. Theory.

[40]  Yoshua Bengio,et al.  Série Scientifique Scientific Series No Unbiased Estimator of the Variance of K-fold Cross-validation No Unbiased Estimator of the Variance of K-fold Cross-validation , 2022 .

[41]  P. V. Oorschot,et al.  Efficient Implementation , 2022 .

[42]  Cesare Furlanello,et al.  Selection of Tree-Biased Classifiers with the Bootstrap 632+ Rule , 1997 .

[43]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[44]  Sally Floyd,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 2004, Machine Learning.

[45]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[46]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[47]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[48]  Kiri Wagstaff,et al.  Alpha seeding for support vector machines , 2000, KDD '00.