Aggregation via empirical risk minimization

Given a finite set F of estimators, the problem of aggregation is to construct a new estimator whose risk is as close as possible to the risk of the best estimator in F. It was conjectured that empirical minimization performed in the convex hull of F is an optimal aggregation method, but we show that this conjecture is false. Despite that, we prove that empirical minimization in the convex hull of a well chosen, empirically determined subset of F is an optimal aggregation method.

[1]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[2]  G. Pisier The volume of convex bodies and Banach space geometry , 1989 .

[3]  B. Bollobás THE VOLUME OF CONVEX BODIES AND BANACH SPACE GEOMETRY (Cambridge Tracts in Mathematics 94) , 1991 .

[4]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[5]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[6]  Peter L. Bartlett,et al.  The Importance of Convexity in Learning with Squared Loss , 1998, IEEE Trans. Inf. Theory.

[7]  Arkadi Nemirovski,et al.  Topics in Non-Parametric Statistics , 2000 .

[8]  E. Berger UNIFORM CENTRAL LIMIT THEOREMS (Cambridge Studies in Advanced Mathematics 63) By R. M. D UDLEY : 436pp., £55.00, ISBN 0-521-46102-2 (Cambridge University Press, 1999). , 2001 .

[9]  M. Ledoux The concentration of measure phenomenon , 2001 .

[10]  A. W. van der Vaart,et al.  Uniform Central Limit Theorems , 2001 .

[11]  A. Tsybakov,et al.  Introduction à l'estimation non-paramétrique , 2003 .

[12]  Alexandre B. Tsybakov,et al.  Optimal Rates of Aggregation , 2003, COLT.

[13]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[14]  G. Lugosi,et al.  Complexity regularization via localized random penalties , 2004, math/0410091.

[15]  S. Mendelson On weakly bounded empirical processes , 2005, math/0512554.

[16]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[17]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[18]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[19]  G. Lecu'e,et al.  Optimal rates and adaptation in the single-index model using aggregation , 2007, math/0703706.

[20]  S. Mendelson,et al.  Subspaces and Orthogonal Decompositions Generated by Bounded Orthogonal Systems , 2007 .

[21]  Guillaume Lecu 'e Suboptimality of Penalized Empirical Risk Minimization in Classification , 2007 .

[22]  S. Mendelson,et al.  Reconstruction and Subgaussian Operators in Asymptotic Geometric Analysis , 2007 .

[23]  Andrew B. Nobel,et al.  Sequential Procedures for Aggregating Arbitrary Estimators of a Conditional Mean , 2008, IEEE Transactions on Information Theory.

[24]  A. Juditsky,et al.  Learning by mirror averaging , 2005, math/0511468.

[25]  Arnak S. Dalalyan,et al.  Aggregation by exponential weighting, sharp oracle inequalities and sparsity , 2008 .

[26]  Shahar Mendelson,et al.  Lower Bounds for the Empirical Minimization Algorithm , 2008, IEEE Transactions on Information Theory.

[27]  Arnak S. Dalalyan,et al.  Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity , 2008, Machine Learning.

[28]  英敦 塚原 Aad W. van der Vaart and Jon A. Wellner: Weak Convergence and Empirical Processes: With Applications to Statistics, Springer,1996年,xvi + 508ページ. , 2009 .