Complexities of convex combinations and bounding the generalization error in classification

We introduce and study several measures of complexity of functions from the convex hull of a given base class. These complexity measures take into account the sparsity of the weights of a convex combination as well as certain clustering properties of the base functions involved in it. We prove new upper confidence bounds on the generalization error of ensemble (voting) classification algorithms that utilize the new complexity measures along with the empirical distributions of classification margins, providing a better explanation of generalization performance of large margin classification methods.

[1]  G. Pisier Remarques sur un résultat non publié de B. Maurey , 1981 .

[2]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[3]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[4]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[5]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[6]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[7]  L. Breiman Arcing Classifiers , 1998 .

[8]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[9]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[10]  V. Koltchinskii,et al.  Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.

[11]  P. Massart Some applications of concentration inequalities to statistics , 2000 .

[12]  A. W. van der Vaart,et al.  Uniform Central Limit Theorems , 2001 .

[13]  Tamás Linder,et al.  Data-dependent margin-based generalization bounds for classification , 2001, J. Mach. Learn. Res..

[14]  D. Panchenko A Note on Talagrand's Concentration Inequality , 2001 .

[15]  Dmitry Panchenko,et al.  Some Local Measures of Complexity of Convex Hulls and Generalization Bounds , 2002, COLT.

[16]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[17]  D. Panchenko Some Extensions of an Inequality of Vapnik and Chervonenkis , 2002, math/0405342.

[18]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[19]  D. Panchenko SYMMETRIZATION APPROACH TO CONCENTRATION INEQUALITIES FOR EMPIRICAL PROCESSES , 2003, math/0405354.

[20]  Dmitry Panchenko,et al.  Generalization Bounds for Voting Classifiers Based on Sparsity and Clustering , 2003, COLT.

[21]  V. Koltchinskii,et al.  Bounding the generalization error of convex combinations of classifiers: balancing the dimensionality and the margins , 2004, math/0405345.

[22]  Wenxin Jiang Process consistency for AdaBoost , 2003 .

[23]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[24]  Vladimir Koltchinskii,et al.  Bounds on margin distributions in learning problems , 2003 .

[25]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[26]  Gilles Blanchard,et al.  On the Rate of Convergence of Regularized Boosting Classifiers , 2003, J. Mach. Learn. Res..

[27]  Alexandre B. Tsybakov,et al.  Optimal Rates of Aggregation , 2003, COLT.

[28]  G. Lugosi,et al.  Data-dependent margin-based generalization bounds for classification , 2003 .

[29]  Mark Kon,et al.  Theoretical and experimental analysis of the generalization ability of some statistical learning algorithms , 2004 .

[30]  Jean-Yves Audibert Aggregated estimators and empirical complexity for least square regression , 2004 .

[31]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[32]  Bin Yu,et al.  Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[33]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[34]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[35]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.