Unlabeled patterns to tighten Rademacher complexity error bounds for kernel classifiers

We derive in this work new upper bounds for estimating the generalization error of kernel classifiers, that is the misclassification rate that the models will perform on new and previously unseen data. Though this paper is more targeted towards the error estimation topic, the generalization error can be obviously exploited, in practice, for model selection purposes as well. The derived bounds are based on Rademacher complexity and result to be particularly useful when a set of unlabeled samples are available, in addition to the (labeled) training examples: we will show that, by exploiting further unlabeled patterns, the confidence term of the conventional Rademacher complexity bound can be reduced by a factor of three. Moreover, the availability of unlabeled examples allows also to obtain further improvements by building localized versions of the hypothesis class containing the optimal classifier.

[1]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[2]  David Page Comparative Data Mining for Microarrays : A Case Study Based on Multiple Myeloma , 2002 .

[3]  Davide Anguita,et al.  In-sample model selection for Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[4]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[6]  S. Boucheron,et al.  On concentration of self-bounding functions , 2009 .

[7]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[8]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[10]  S. Boucheron,et al.  Concentration inequalities using the entropy method , 2003 .

[11]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[12]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[13]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[14]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[15]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[16]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[17]  Davide Anguita,et al.  The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers , 2011, NIPS.

[18]  Davide Anguita,et al.  Maximal Discrepancy for Support Vector Machines , 2011, ESANN.

[19]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[20]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[21]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[22]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[23]  Nello Cristianini,et al.  Margin Distribution and Soft Margin , 2000 .

[24]  Davide Anguita,et al.  In-Sample and Out-of-Sample Model Selection and Error Estimation for Support Vector Machines , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Graziano Pesole,et al.  On the statistical assessment of classifiers using DNA microarray data , 2006, BMC Bioinformatics.

[26]  Constantin F. Aliferis,et al.  GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data , 2005, Int. J. Medical Informatics.

[27]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[28]  Dariu Gavrila,et al.  An Experimental Study on Pedestrian Classification , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[30]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[31]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[32]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.