On the Importance of Small Coordinate Projections

It has been recently shown that sharp generalization bounds can be obtained when the function class from which the algorithm chooses its hypotheses is "small" in the sense that the Rademacher averages of this function class are small. We show that a new more general principle guarantees good generalization bounds. The new principle requires that random coordinate projections of the function class evaluated on random samples are "small" with high probability and that the random class of functions allows symmetrization. As an example, we prove that this geometric property of the function class is exactly the reason why the two lately proposed frameworks, the luckiness (Shawe-Taylor et al., 1998) and the algorithmic luckiness (Herbrich and Williamson, 2002), can be used to establish generalization bounds.

[1]  Shahar Mendelson,et al.  Improving the sample complexity using global data , 2002, IEEE Trans. Inf. Theory.

[2]  Shahar Mendelson,et al.  Rademacher averages and phase transitions in Glivenko-Cantelli classes , 2002, IEEE Trans. Inf. Theory.

[3]  Ralf Herbrich,et al.  Algorithmic Luckiness , 2001, J. Mach. Learn. Res..

[4]  V. Milman,et al.  Asymptotic Theory Of Finite Dimensional Normed Spaces , 1986 .

[5]  Shahar Mendelson,et al.  A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[6]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[7]  S. Boucheron,et al.  Concentration inequalities using the entropy method , 2003 .

[8]  M. Ledoux The concentration of measure phenomenon , 2001 .

[9]  Peter L. Bartlett,et al.  The Importance of Convexity in Learning with Squared Loss , 1998, IEEE Trans. Inf. Theory.

[10]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[11]  G. Schechtman,et al.  Remarks on Talagrand’s deviation inequality for Rademacher functions , 1990, math/9201208.

[12]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[13]  G. Pisier The volume of convex bodies and Banach space geometry , 1989 .

[14]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[15]  P. MassartLedoux,et al.  Concentration Inequalities Using the Entropy Method , 2002 .

[16]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[17]  Y. Gat A bound concerning the generalization ability of a certain class of learning algorithms , 1999 .

[18]  M. Talagrand Concentration of measure and isoperimetric inequalities in product spaces , 1994, math/9406212.

[19]  M. Talagrand Majorizing measures: the generic chaining , 1996 .

[20]  S. Mendelson,et al.  Entropy and the combinatorial dimension , 2002, math/0203275.

[21]  Shahar Mendelson,et al.  Random Subclass Bounds , 2003, COLT.

[22]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[23]  Don R. Hush,et al.  Machine Learning with Data Dependent Hypothesis Classes , 2002, J. Mach. Learn. Res..

[24]  O. Bousquet A Bennett concentration inequality and its application to suprema of empirical processes , 2002 .