Simultaneous adaptation to the margin and to complexity in classification

We consider the problem of adaptation to the margin and to complexity in binary classification. We suggest a learning method with a numerically easy aggregation step. Adaptivity both to the margin and complexity in classification, usually involves empirical risk minimization or Rademacher complexities which lead to numerical difficulties. On the other hand there exist classifiers that are easy to compute and that converge with fast rates but are not adaptive. Combining these classifiers by our aggregation procedure we get numerically realizable adaptive classifiers that converge with fast rates.

[1]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[2]  Ingo Steinwart,et al.  Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[3]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[4]  S. Geer,et al.  Classifiers of support vector machine type with \ell1 complexity regularization , 2006 .

[5]  Andrew R. Barron,et al.  Information Theory and Mixing Least-Squares Regressions , 2006, IEEE Transactions on Information Theory.

[6]  L. Birge,et al.  Model selection via testing: an alternative to (penalized) maximum likelihood estimators , 2006 .

[7]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[8]  G. Lugosi,et al.  Prediction, learning, and games , 2006 .

[9]  Alexandre B. Tsybakov,et al.  Aggregation of density estimators and dimension reduction , 2005 .

[10]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[11]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers under the margin condition , 2005 .

[12]  Ingo Steinwart,et al.  Fast Rates for Support Vector Machines , 2005, COLT.

[13]  S. Geer,et al.  Square root penalty: Adaptation to the margin in classification and in edge estimation , 2005, math/0507422.

[14]  A. Juditsky,et al.  Recursive Aggregation of Estimators by Mirror Descent Algorithm with Averaging , 2005, math/0505333.

[15]  A. Tsybakov,et al.  Aggregation for Regression Learning , 2004, math/0410214.

[16]  G. Lugosi,et al.  Complexity regularization via localized random penalties , 2004, math/0410091.

[17]  Yi Lin A note on margin-based loss functions in classification , 2004 .

[18]  Gilles Blanchard,et al.  On the Rate of Convergence of Regularized Boosting Classifiers , 2003, J. Mach. Learn. Res..

[19]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[20]  G. Lugosi,et al.  On the Bayes-risk consistency of regularized boosting methods , 2003 .

[21]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[22]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[23]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[24]  Tong Zhang An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[25]  Yuhong Yang Combining Different Procedures for Adaptive Regression , 2000, Journal of Multivariate Analysis.

[26]  A. Juditsky,et al.  Functional aggregation for nonparametric regression , 2000 .

[27]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[28]  N. Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[29]  Yuhong Yang Mixing Strategies for Density Estimation , 2000 .

[30]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[31]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[32]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[33]  K. Burnham,et al.  Model selection: An integral part of inference , 1997 .

[34]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[35]  E. Mammen,et al.  Asymptotical minimax recovery of sets with smooth boundaries , 1995 .

[36]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[37]  F. Beaufils,et al.  FRANCE , 1979, The Lancet.

[38]  R. Dudley Metric Entropy of Some Classes of Sets with Differentiable Boundaries , 1974 .

[39]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[40]  P. Bartlett,et al.  Local Rademacher complexities and oracle inequalities in risk minimization , 2006 .

[41]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[42]  O. Bousquet THEORY OF CLASSIFICATION: A SURVEY OF RECENT ADVANCES , 2004 .

[43]  S. Geer,et al.  Adaptivity of Support Vector Machines with ` 1 Penalty , 2004 .

[44]  J. Picard,et al.  Statistical learning theory and stochastic optimization : École d'eté de probabilités de Saint-Flour XXXI - 2001 , 2004 .

[45]  Alexandre B. Tsybakov,et al.  Optimal Rates of Aggregation , 2003, COLT.

[46]  Alex Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[47]  P. Bühlmann,et al.  Analyzing Bagging , 2001 .

[48]  Arkadi Nemirovski,et al.  Topics in Non-Parametric Statistics , 2000 .

[49]  P. Massart Some applications of concentration inequalities to statistics , 2000 .

[50]  S. Geer Applications of empirical process theory , 2000 .

[51]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[52]  A. Tsybakov,et al.  Minimax theory of image reconstruction , 1993 .

[53]  M. Loève Probability theory , 1963 .