An improved analysis of the Rademacher data-dependent bound using its self bounding property

The problem of assessing the performance of a classifier, in the finite-sample setting, has been addressed by Vapnik in his seminal work by using data-independent measures of complexity. Recently, several authors have addressed the same problem by proposing data-dependent measures, which tighten previous results by taking in account the actual data distribution. In this framework, we derive some data-dependent bounds on the generalization ability of a classifier by exploiting the Rademacher Complexity and recent concentration results: in addition of being appealing for practical purposes, as they exploit empirical quantities only, these bounds improve previously known results.

[1]  Sanjeev R. Kulkarni,et al.  Learning Pattern Classification - A Survey , 1998, IEEE Trans. Inf. Theory.

[2]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[3]  Minimization Vladimir Koltchinskii Rademacher Penalties and Structural Risk , 2001 .

[4]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[5]  Martin Anthony,et al.  Aspects of discrete mathematics and probability in the theory of machine learning , 2008, Discret. Appl. Math..

[6]  Malik Magdon-Ismail,et al.  Permutation Complexity Bound on Out-Sample Error , 2010, NIPS.

[7]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[8]  Vladimir Cherkassky,et al.  Model complexity control for regression using VC generalization bounds , 1999, IEEE Trans. Neural Networks.

[9]  S. Boucheron,et al.  A sharp concentration inequality with applications , 1999, Random Struct. Algorithms.

[10]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[11]  Davide Anguita,et al.  Selecting the hypothesis space for improving the generalization ability of Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[12]  P. MassartLedoux,et al.  Concentration Inequalities Using the Entropy Method , 2002 .

[13]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[14]  G. Lugosi,et al.  On Concentration-of-Measure Inequalities , 1998 .

[15]  Davide Anguita,et al.  Quantum optimization for training support vector machines , 2003, Neural Networks.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  Davide Anguita,et al.  The Impact of Unlabeled Patterns in Rademacher Complexity Theory for Kernel Classifiers , 2011, NIPS.

[18]  Kathleen Marchal,et al.  M@cbeth: a Microarray Classification Benchmarking Tool , 2005 .

[19]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[20]  Davide Anguita,et al.  In-sample model selection for Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[21]  Ambuj Tewari,et al.  Smoothness, Low Noise and Fast Rates , 2010, NIPS.

[22]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[23]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[24]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.