Rademacher Complexity and Structural Risk Minimization: An Application to Human Gene Expression Datasets

In this paper, we target the problem of model selection for Support Vector Classifiers through in–sample methods, which are particularly appealing in the small–sample regime, i.e. when few high–dimensional patterns are available. In particular, we describe the application of a trimmed hinge loss function to Rademacher Complexity and Maximal Discrepancy based in–sample approaches. We also show that the selected classifiers outperform the ones obtained with other state-of-the-art in-sample and out–of–sample model selection techniques in classifying Human Gene Expression datasets.

[1]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[2]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[3]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[4]  Davide Anguita,et al.  Selecting the hypothesis space for improving the generalization ability of Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[5]  Johan A. K. Suykens,et al.  Morozov, Ivanov and Tikhonov Regularization Based LS-SVMs , 2004, ICONIP.

[6]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[7]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[8]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[9]  Davide Anguita,et al.  In-sample model selection for Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[10]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[11]  Davide Anguita,et al.  Quantum optimization for training support vector machines , 2003, Neural Networks.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Davide Anguita,et al.  Maximal Discrepancy for Support Vector Machines , 2011, ESANN.

[14]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .