论文信息 - Model selection for support vector machines: Advantages and disadvantages of the Machine Learning Theory

Model selection for support vector machines: Advantages and disadvantages of the Machine Learning Theory

A common belief is that Machine Learning Theory (MLT) is not very useful, in pratice, for performing effective SVM model selection. This fact is supported by experience, because well-known hold-out methods like cross-validation, leave-one-out, and the bootstrap usually achieve better results than the ones derived from MLT. We show in this paper that, in a small sample setting, i.e. when the dimensionality of the data is larger than the number of samples, a careful application of the MLT can outperform other methods in selecting the optimal hyperparameters of a SVM.

[1] N. Metropolis,et al. The Monte Carlo method. , 1949 .

[2] J. Mesirov,et al. Chemosensitivity prediction by transcriptional profiling , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3] P. Massart,et al. Statistical performance of support vector machines , 2008, 0804.0551.

[4] T. Poggio,et al. Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5] Adam Tauman Kalai,et al. Probabilistic and on-line methods in machine learning , 2001 .

[6] Peter L. Bartlett,et al. Local Complexities for Empirical Risk Minimization , 2004, COLT.

[7] Edward R. Dougherty,et al. Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[8] R. Fletcher. Practical Methods of Optimization , 1988 .

[9] Alan F. Murray,et al. Novelty detection using products of simple experts--a potential architecture for embedded systems , 2001, Neural Networks.

[10] Vladimir Vapnik,et al. An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[11] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..

[12] Constantin F. Aliferis,et al. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[13] Davide Anguita,et al. Quantum optimization for training support vector machines , 2003, Neural Networks.

[14] Yoshua Bengio,et al. An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[15] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16] K. G. Ramakrishnan,et al. Computational results of an interior point algorithm for large scale linear programming , 1991, Math. Program..

[17] J. D. Beasley,et al. Algorithm AS 111: The Percentage Points of the Normal Distribution , 1977 .

[18] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[19] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[21] T. Poggio,et al. General conditions for predictivity in learning theory , 2004, Nature.

[22] Ambuj Tewari,et al. Sparseness vs Estimating Conditional Probabilities: Some Asymptotic Results , 2007, J. Mach. Learn. Res..

[23] Michaël Aupetit. Nearly homogeneous multi-partitioning with a deterministic generator , 2009, Neurocomputing.

[24] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[25] Colin Campbell,et al. An introduction to kernel methods , 2001 .

[26] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[27] M. Talagrand. Transportation cost for Gaussian and other product measures , 1996 .

[28] Davide Anguita,et al. Testing the Augmented Binary Multiclass SVM on Microarray Data , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[29] Keith Worden,et al. STRUCTURAL FAULT DETECTION USING A NOVELTY MEASURE , 1997 .

[30] Alexander J. Smola,et al. Learning with kernels , 1998 .

[31] Sameer Singh,et al. Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[32] Constantin F. Aliferis,et al. Using the GEMS System for Cancer Diagnosis and Biomarker Discovery from Microarray Gene Expression Data , 2005, AAAI.

[33] Steve R. Gunn,et al. Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[34] Chih-Jen Lin,et al. A Practical Guide to Support Vector Classication , 2008 .

[35] Peter L. Bartlett,et al. Optimal Sample-Based Estimates of the Expectation of the Empirical Minimizer , 2005 .

[36] Chih-Jen Lin,et al. Asymptotic convergence of an SMO algorithm without any assumptions , 2002, IEEE Trans. Neural Networks.

[37] Hava T. Siegelmann,et al. A support vector clustering method , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[38] J. Weston,et al. Support Vector Machine Solvers , 2007 .

[39] Bernhard Schölkopf,et al. A tutorial on support vector regression , 2004, Stat. Comput..

[40] Student,et al. THE PROBABLE ERROR OF A MEAN , 1908 .

[41] Stephen R. Marsland,et al. Novelty Detection for Robot Neotaxis , 2000, ArXiv.

[42] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .

[43] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[44] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[45] Hava T. Siegelmann,et al. Support Vector Clustering , 2002, J. Mach. Learn. Res..

[46] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[47] Milton Abramowitz,et al. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[48] J. Simon. Resampling: The new statistics , 1995 .

[49] Francesco Camastra,et al. A Novel Kernel Method for Clustering , 2005, IEEE Trans. Pattern Anal. Mach. Intell..